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1. Introduction 

Let Xi, . . . ,Xn be independent real- valued zero- mean random variables (r.v.'s) 
such that Xi ^ y almost surely (a.s.) for some y > and all i. Let S := 
Xi + - ■ ■+X„ and assume that a := VEi ^^'t ^ (0' o^)- The Bennett-Hoeffding 
[1, 26] inequality states that 

2 

P{S ^x)^ BH(ar) := BR,2^y{x) := exp { - ^ V'(^) } (1-1) 
for all a; ^ 0, where 

V'(m) := (1 + It) ln(l + u) - m; (1.2) 

see e.g. [1] concerning the importance of such bounds. Inequality (1.1) has been 
generalized to include cases when the X^s arc not independent and/or are not 
real-valued; see e.g. [12, 13, 14, 16, 18, 20, 23, 27, 28, 29, 30, 50, 35, 37, 55, 56, 57]. 

Inequality (1.1) is based on an upper bound on the exponential moments of 
S: ' 

E e^'^ < BHexp(A) := exp { ^ " ~y2~ ^-^j for all A > 0; (1.3) 

that is, 

BH(a:) = inf e"^^ BHexp(A). (1.4) 

Attempts at refining the Bennett-Hoeffding inequality by taking moments 
higher than the second ones into consideration were made in [25, 24, 32, 59]; 
however, in contrast with the Bennett-Hoeffding bounds, the bounds given in 
[25, 24, 32, 59] were not the best possible in their own terms. Such best possible, 
exact bounds refining the Bennett-Hoeffding ones were obtained by Pinelis and 
Utev [52, Theorems 2 and 6]. In particular, [52, Theorem 2] implies that 

Ee^^<PUexp(A):=exp{y(l-£)a2 + ^-sa^j VA > 0, (1.5) 

where 

x+ := V a; = max(0, x) and x" := {x+)", whence for all a; > 

P{S > x) < PU(a;) := inf e"^^ PUexp(A). (1.6) 

Note that e E (0, 1). Hence and because ^ < ^^""^"^^ for all A > and y > 
0, the Pinelis-Utev upper bounds PUexp(A) and PU(3;) are always less than the 
Bennett-Hoeffding upper bounds BHexp(A) and PU(a;), respectively. Moreover, 
the PU bounds may be significantly less than the BH ones; this happens when 
£ is much less than 1, which in particular is the case when Xi, . . . , X„ form the 
initial segment of an infinite sequence of i.i.d. r.v.'s Xi,X2,. ■ . with finite EX^ 
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and E{Xi)^, n is large, and y is of the order of \/n (such a situation occurs in 
proofs of non- uniform Bcrry-Essccn type bounds) . 

Note also that the mentioned Theorem 2 in [52] is formally more general than 
inequality (1.5), in that [52, Theorem 2] is given in terms of /3+ := E{Xi)^ 
for any p G [2,3], rather than /3^. However, the exact upper bound in [52, 
Theorem 2] on Ee'^'^ with p G [2,3] is no less than that with p = 3, since 
^3 ^ (^p y^~^ ^'^'^ P ^ [2' *^]- Thus, nothing will be lost by taking p to bo just 
3. 

As pointed out in [26, 52], the exponential bounds BHexp(A) and PUexp(A) 
are each exact in its own terms. That is, BHexp(A) is the exact upper bound on 
Ee^''' with A, y, and a fixed; and PUexp(A) is the exact upper bound on Ee^'^ 
with A, y, a, and e fixed. 

If e is small indeed, then the bounds PUcxp(A) and PU(x) are close to 
the corresponding exponential bounds for the normal distribution, and 
g-a: /(2ct ) However, even for a standard normal r.v. Z, the best exponential 
upper bound, e^^ on the tail probabihty P{Z ^ x) is "missing" a factor of 
the order of 1/x for large a; > 0, since P(Z ^ a;) ~ — ^e~^ as a; — > oo. 
This deficiency of exponential bounds is caused by the fact that the class of all 
increasing exponential functions is too small. 

Apparently the first step towards removing this deficiency was made by 
Eaton [21, 22], who proved that for all functions / in a rich class containing 
all functions of the form M 9 a; i-^ {\x\ — t)\ for < > one has 

E/(^)^E/(Z) (1.7) 

if Xi = air]i for all i, where the ry,'s are independent (not necessarily identically 

distributed) zero-mean r.v.'s such that |?7j| < 1 a.s. for all i, and H ha^ = 1. 

It is easy to sec that inequality (1.7) for all / in the Eaton class implies the 
same inequality for all symmetrized exponential functions of the form M 9 a; i— > 
cosh Ax, with any A e M. In view of the central limit theorem, it is clear that 
the upper bound E/(Z) in (1.7) on Ef{S) is exact for each / e ^Ea- Moreover, 
then the inequality 

P{\S\^ x) ^E&{x) := inf E{\Z\ - t)l/{x - tf (1.8) 
te(o,x) 

for X > provides the best possible upper bound Ea(a;) on PdS] ^ a;) based on 

comparison inequality (1.7). Eaton showed that the bound Ea(a;) is majorized 
by a function which is asymptotic to C3 P(|.^| ^ a;) as a; ^ 00, where C3 := 
^ 4.46. Thus, the "missing" factor of the order of 1/x was restored, for the 
bounded X^'s. Tables for the bound Ea and related bounds were given in [19]. 
Eaton [22] also conjectured that Pd^l > x) ^ 2c3ie-^'/Vv^ for x > ^2. 
The stronger form of this conjecture, 

P(S' > x) < cP(Z ^ x) (1.9) 

for all X G M with c = C3 was proved by Pinelis [36] , along with multidimensional 
extensions and applications to the Hotelling-type tests. (More exactly, in [36] 
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a two-tail version of inequality (1.9) was given. The right-tail inequality (1.9) 
can be proved quite similarly; alternatively, it follows from general results of 
[38].) Various generalizations and improvements of inequality (1.9) as well as 
related results were given by Pinelis [38, 39, 41, 43, 44, 46] and Bentkus [2, 3, 4, 5] 
(with co-authors). For Rademacher ry^'s, a version of (1.9) with a better constant 
factor c, which is about 1% off the best possible one, was given in [47]; related 
inequalities were obtained in [8, 48]. 

Pinelis [38] provided a general device allowing one to extract the optimal tail 
comparison inequality from a generalized moment comparison. To state that 
result, consider the Eaton-type classes of functions /: M ^ M: 

ni:={f: f{u) = j::°Ju-t)%fi{dt) VueR}, a^O, (1.10) 

where ^ is a Borel measure, and 0° := 0; of course, when used with functions 
or classes of functions (as, for example, in the symbol Ti"), the subscript + will 
have a meaning different from that in the definition a;+ := OV x. 
It is easy to see [39, Proposition l(ii)] that 

^ /3 < a implies n+ C W^. (1.11) 

Proposition 1.1. [43] For natural a, one has f S Ti," if and only if f has finite 
derivatives /'"^ := /, /'^^ :=/',.••, /'"~^^ on M such that is convex on 

R and /(^)(-oo+) = /or j = 0, 1, . . . , a - 1. 

It follows from (1.11) and Proposition 1.1 that, for every t e M, every a ^ 0, 
every /? ^ a, and every A > 0, the functions u i— > (u — t)^ and u i— > e^^"~*^ 
belong to H". 

The next theorem follows immediately from results of [38, 39]; in particular, 
see [38, Theorem 3.11] (and its proof) and [39, Theorem 4]. 

Theorem 1.2. Suppose that < /3 ^ a, ^ and rj are real-valued r.v.'s, and the 
tail function u i— > P{r] ^ u) is log-concave on M. Then the comparison inequality 

E/(0<E/(r?) forallf&Hl (1.12) 

implies 

E fii) ^ c„,^ E /(77) for all f eU^ (1.13) 
and, in particular, for all real x, 

P{^>x)^P^{r,;x):= M ^/j (1.14) 

te{ — oo,x) [X — t) 
< Ca,0 P{V > X), (1.15) 

where 

_ r(a + l)(e/a)" 

for (3 > 0; Cq,,o ■— r(Q:-|-l)(e/Q;)". Moreover, the constant Ca,f3 is the best possible 
in (1.13) and (1.15). 
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A similar result for the case when a = 1 and /3 = is contained in the book 
by Shorack and Wellner (1986) [56], pages 797-799. 

Definition 1.3. For any r.v. ry, let the function R 9 x P^''('7 ^ x) be defined 
as the least log-concave majorant over R of the tail function R 9 a; P(ry ^ x) 
of the r.v. r]. 

Remark 1.4. One has P^{a + br]^x) = P^{t] > ^) for all a; e R and all real 

constants a and b such that & > 0. 

Remark 1.5. As follows from [38, Remark 3.13], a useful point is that the require- 
ment of the log-concavity of the tail function R 9 u i— » P(?7 ^ u) in Theorem 
1.2 can be removed — at least as far as (1.15) is concerned — by replacing 
P{t] ^ x) in (1.15) with P^{ri ^ x). However, then the optimality of Ca,i3 is not 
guaranteed. 

Detailed studies of various cases and aspects of the bound Pa{ri;x) defined 
in (1.14) were presented in [19, 38, 6]. 

Note that cs^o = cs = 2e^/9, which is the constant factor mentioned above, 
after inequality (1.8). 

Going back to the Bcnnett-Hocfding and Pinelis-Utev bounds defined in (1.3) 
and (1.5), observe that they have a transparent probabilistic interpretation: 

BHoxp(A) = Eexp {A2/no.2/j,2} and (1-16) 
PUexp(A) = Eexp{A(r(i_e)<.2 +2/6^^2/^2)} (1.17) 

for all A, where the following definition is employed. 

Definition 1.6. For any a > and ^ > 0, let r„2 and Hg stand for any 
independent r.v.'s such that 

r„2 ~ N(0,a2) and ~ Pois(6l); 

that is, has the normal distribution with parameters and a^, and He has 
the Poisson distribution with parameter 6; at that, let Fq and IIq be defined as 
the constant zero r.v. Let also 

fie ■.= Ile-EIle = Ue-e. 

Thus, (1.3) and (1.5) can be viewed as the generalized moment comparison 
inequalities 

E/(5)< E/(yn,2/^2) and (1.18) 
E f{S) < E /(r(i_,),2 + yll,^yy.) , (1.19) 

over the class of all increasing exponential functions R 9 x i-^- f{x) — e^^, 
A > 0. Note that, of the total variance ct^ of the r.v. T(^i_g^^2 + yflg^r^ jyi in 
(1.19), the part of the variance equal (1 — £)cr^ is apportioned to the light-tail 
centered-Gaussian component F(i_£)c,2, while the rest of the variance, ecr^, is 
apportioned to the heavy-tail centered- Poisson component yflsa'^ /j/2 . 
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Bentkus [2, 4] extended inequality (1.18) to all / of the form f{x) = (x — t)'^; 
hence, recalling (1.10), one has (1.18) for all / € W^. Moreover, it follows by 
(1.14), (1.15), and Remark 1.5 that for all x ^ 

P{S ^x)^ Be{x) := P2{yil„2/y2;x) < ca.o P^iyllayy^ > x); (1.20) 

note also that C2.0 = e^/2. Similar results for stochastic integrals were obtained 
in [28]. Since the class Tl\ contains all increasing exponential functions, the 
Bentkus bound Be(a;) is an improvement of the Bennett-Hoeffding bound BH(a;) 
given by (1.1). 

In this paper, we shall similarly improve the Pinelis-Utev exponential bounds 
given by (1.5) and (1.6), which, as was mentioned, in turn refine and improve 

the corresponding Bennett-Hoeffding bounds. This will require proofs of a sig- 
nificantly higher level of difhculty, with some substantially new ideas. 

2. Statements of the main results 

We shall show that the generalized moment comparison inequality (1.19) takes 
place for all / in and, in fact, for all / in the slightly larger class 

J^^ := {/ e : / and /" are nondecreasing and convex} 

= {/ e C2 : /, /', /", are nondecreasing}, (2.1) 

where denotes the class of all twice continuously differentiable functions 
/: M ^ R and /"' denotes the right derivative of the convex function /". For 
example, functions xi-^a + bx + c{x — t)!^ and x t-^ a + bx + ce^^ belong to 
J^l for all a e R, 6 > 0, c > 0, i e K, a > 3, and A > 0. It is easy to see that 
nl C Tf. 

Remark. If a function / : M R is convex and a r.v. X has a finite expectation, 
then, by Jensen's inequality, E f{X) always exists in (—00, 00]. This remark will 
be used in this paper (sometimes tacitly) for functions / in the class J-"^, as well 
as for other convex functions. 

Let Xi, . . . , Xn be independent r.v.'s, with the sum S := Xi + ■ ■ ■+Xn- Also, 
recall now Definiton 1.6. 

Theorem 2.1. Let a, y, and jS be any (strictly) positive real numbers such that 

.:=Ae(0,l). (2.2) 

Suppose that 

J2^Xf^a^, ^E(Xi)t</3, EXi^O, andXi^ya.s., (2.3) 



for all i. Then 
for all f 



Ef{S) < E/(r(i_,),. +2/n,,./^.) (2.4) 
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The proof of Theorem 2.1 will be given in Section 4, where all the necessary 
proofs are deferred to. 

Note that the condition e G (0, 1) in (2.2) does not at all diminish generality, 
since it is easy to see that E{Xi)'^ < a'^y for any positive a and y and any 
r.v.'s Xi, . . . , Xn such that EXf < cr^, EXi ^0, and Xi ^ y a.s., for all i; 
so, one can always choose /3 to be in the interval ( E(Xi)i^, cr^y), and then 
one will have e G (0, 1). 

Proposition 2.2. Let the class ^ of functions be defined by removing f from 
the list "/, /', /", /'" " in (2.1); similarly define the class removing both 

f and f from the same list; thus, each of these two new classes is larger than 

the class J-'f . 

(i) If the condition "EXi ^ Vi" in Theorem 2.1 is replaced by "EXi = Q 
Mi", then inequality (2.4) will hold for all f in the larger class T\ ^. 

(ii) If the conditions "EXi ^ Vi " andY^,- EXf ^ cr^ in Theorem 2.1 are both 
replaced by the equalities "EXi = Vi" and EX? = a^, then (2.4) will 
hold for all f in the larger class . 

Proposition 2.3. For each triple (a, y, (5) of positive numbers satisfying con- 
dition (2.2) and each f G J^\, the upper bound E f {V + y\\^„2 iy2^ on 
Ef{S), given by (2.4), is exact; moreover, this bound remains exact if the first 
three inequalities in the condition (2.3) are replaced by the corresponding equal- 
ities. 

Comparison inequality (2.4) is optimal in yet another sense: namely, there the 
class of generalized moment functions / cannot be substantially enlarged if 
(2.4) is to remain true. To state this optimality property more precisely, let us 
first note a simple corollary of Theorem 2.1, which follows immediately because 



In fact, in Section 4 essentially we shall first prove Corollary 2.4 and then 
extend the comparison inequality from T-L\ to J^^. In this sense, one can say 
that Theorem 2.1 and Corollary 2.4 are equivalent to each other. Now one is 

ready to state the other optimality property: 

Proposition 2.5. For any given p e (0,3), one cannot replace Ti^ in Corol- 
lary 2.4 by the larger class H^; in fact, this cannot be done even for n = 1. 

By Theorem 1.2 and Remark 1.5, one immediately obtains the following 
corollary of Theorem 2.1. 

Corollary 2.6. Under the conditions of Theorem 2.1, for all x gR 




P{S ^x)^ Pin(x) := P3{r^i_,)„2 -\- yn,„2^y2;x) 

< C3,o P"'(r(i_£)<^2 + yUsa'^/y2 > x). 



(2.5) 
(2.6) 
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Bennett [1] states that "for most practical problems, precisely" "information 
on the distribution function of a sum when the number of component random 
variables is small and/or the variables have different distributions" "is required" . 
Accordingly, let us consider now the case when — instead of the upper bounds in 

(2.3) on the sums of moments and the uniform a.s. upper bound y on the X^'s — 
such upper bounds are available for the individual distributions of the summands 
Xi, with possibly different upper bounds for different i. More specifically, some 
of the summands Xi may be significantly smaller (in a certain sense) than the 
rest of them. Then, grouping them together and using certain results of [46], 
one can obtain the following improvement of Theorem 2.1 and Corollary 2.6. 

Corollary 2.7. Suppose that 

X.^Vii^y a.s., EXf^al E(X,)^ < ft, EX, ^ 0, (2.7) 

for all i, where y, yi, a^, and ft are some positive real numbers. Also, suppose 
that (cf (2.2); 

e:=^G(0,l), (2.8) 
a^y 

where 

P:=^f3il{yi>ai} and fT:= /^a^. (2.9) 

i V ' 

Then inequalities (2.4), (2.5), and (2.6) hold with e in place of e: 

E/(5) ^ E/(r(i_g),2 +,yfl,,2/^2) for all f G J'l; (2.10) 
PiS > x) ^ P3ir^,_,y2 +yU,„2/y2;x) (2.11) 
< C3,o P^^{T(i-i)a^ + yUia^ /y^ ^ x) for all xeR. (2.12) 

Note that conditions (2.7) together with (2.9) will imply (2.3) if /3 := X^ift- 
As for condition (2.8), similarly to condition (2.2), it does not diminish gener- 
ality. In fact, one will obviously have 

(2.13) 

Then, (2.13) will imply (by Lemma 4.7) that inequalities (2.10), (2.11), and 
(2.12), as established by Corollary 2.7, will respectively be improvements of 

(2.4) , (2.5), and (2.6). 

For completeness, let us also present results similar to Theorem 2.1, Propo- 
sitions 2.2, 2.3, and 2.5, and Corollary 2.4, without conditions on the tnuicatcd 
third moments E{Xi)^ and for somewhat larger classes of generalized moment 
functions. Let (cf. (2.1)) 

J^l := {/ e : / and /' are nondccreasing and convex} 

= {f eC^: f, f, f" are nondecreasing} , (2.14) 
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where C denotes the class of all continuously differentiable functions f-.R—^M. 

and /" denotes the right derivative of the convex function /'. For example, 
functions x a + bx + c{x — t)" and xt-^a + bx + c e^^ belong to J^^. for all 
a e R, 6 > 0, c > 0, i e K, a > 2, and A > 0. It is easy to see that n\ C J^l, 
and it is obvious that C 

Proposition 2.8. (Cf. Theorem 2.1.) Let a and y be any (strictly) positive real 
numbers. Suppose that 

^EXf<c7^, EXi^O, andXi^ya.s., (2.15) 

i 

for all i. Then 

Ef{S)^Ef{ytl„yy2) (2.16) 

for allfeJ^. 

Proposition 2.9. (Cf. Proposition 2.2.) Let the class T\ ^ of functions be 
defined by removing f from the list "f,f',f"" in (2.14); similarly define the 
class T\ by removing both f and f from the same list; thus, each of these 
two new classes is larger than the class T\. 

(i) If the condition "EXi ^0 V«" in Proposition 2.8 is replaced by 'EXj = 

Mi", then inequality (2.16) will hold for all f in the larger class T\ ^. 
(ii) If the conditions "EX, ^ Vi" and Ei EXf ^ (j^ in Proposition 2.8 are 
both replaced by the equalities "EXi = Vi" and X^jEX? = a^, then 
(2.16) will hold for all f in the larger class JF^ 

As mentioned in the Introduction, similar results for (continuous-time) mar- 
tingales that arc stochastic integrals were obtained by Klein, Ma and Privault 
[28], for the class T'^ that is, for the class of all functions / such that / and 
/' are convex. Cf. Remark 2.13 below. 

Proposition 2.10. (Cf. Proposition 2.3.) For each pair (cr^y) of positive num- 
bers and each f £ , the upper bound E f {ylla2 /y2^ onEf{S), given by (2.16), 
is exact; moreover, this bound remains exact if the first two inequalities in the 
condition (2.15) are replaced by the corresponding equalities. 

Corollary 2.11. (Cf. Corollary 2.4.) In Proposition 2.8, one can replace !F\ 
by n\. 

As mentioned in the Introduction, Corollary 2.11 is essentially contained 
in Bentkus [4]. By Theorem 1.2 and Remark 1.5, Corollary 2.11 immediately 
implies the Bentkus inequality (1.20). 

Proposition 2.12. (Cf. Proposition 2.5.) For any given p G (0,2), one cannot 
replace H'^ in Corollary 2.11 by the larger class H^; in fact, this cannot be done 
even for n= 1. 
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Remark 2.13. Quite similarly to how it was done e.g. in [44, 46], it is easy to 
extend the results of Theorem 2.1, Propositions 2.2, 2.8, and 2.9, and Corol- 
lary 2.6 to the more general case when the Xj's are the incremental differences 
of a (discrete-time) (super)martingale and/or replace 5 by the maximum of the 
partial sums; cf. e.g. [46, Corollary 5]. Let us omit the details. 

On majorization of the distributions of sums of independent r.v.'s by com- 
pound Poisson distributions see e.g. [54, 51, 52, 58, 40, 10]. Also indirectly 
related to the present paper is the work [11, 9], where it was shown that the 
rate of convergence in the functional central limit theorem can be significantly 
improved if the limit is taken to be the convolution of appropriately chosen 
Gaussian and Poisson distributions, rather than just a Gaussian distribution. 
Of course, this quite wc^ll correspcmds with the fact that the limit distributions 
for the sums of uniformly small independent r.v.'s are precisely the limits of 
convolutions of Gaussian and compound Poisson distributions. One may also 
note here the work [31], where, by taking specific heavy tails into account, 
asymptotics of large deviation probabilities P(S'„ ^ x) for the sum Sn of i.i.d. 
r.v.'s was obtained essentially without any restrictions on x other than that just 
x/y/n oo or, equivalently, P{Sn ^ x) ^ 0; functional versions of such results 
were given in [34]. 



3. Computation and comparison of the upper bounds on the tail 
probability P{S ^ x) 

3. 1 . Computation 

The Bennett-Hoeffding upper bound BH(a;), given by (1.1), is quite easy to 
compute. It is almost as easy to compute the Pinelis-Utev upper bound PU(a;), 
defined in (1.6). 

Proposition 3.1. For all a > 0, y > 0, e & (0, 1), and x ^ 

PU(ar)=e-^^-PUe,p(A,) (3.1) 
_ (1 - efiw, + 1)^ -{6 + xy/a^r - (1 - e^) 
-^""P 2(1 - £)yVa2 ^'^■^> 



where PUexp is defined in (1.5), 

/ c -r J.y/ C ^^ 

y 



Ife + xy/a^ \ ^( ^ e + xy/a'^\ , , 

Ax := - — z Wx), Wx-=L[- exp — I, (3.3) 



and L is (the principal branch of) the Lambert product-log function, so that for 
all z ^ the value w = L{z) is the only real root of the equation we^ = z. 
Moreover, Ax increases in x from to oo as x does so. 

Thus, indeed PU(a;) is easy to compute, since the Lambert function is about 
as easy to compute as the logarithmic one; in particular, in Mathematica the 
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Lambert function is the built-in function ProductLog; see e.g. [17] and references 
there concerning this matter. 

A shght advantage of expression (3.2) over (3.1) is that (3.2) contains just 
one entry of Wx, while (3.1) contains several entries of A^; (recall (1.5) and (3.3)); 
also, the exponent in (3.2) is algebraic (actually quadratic) in Wx- 

As for bounds Be (a;) = P2{ytla'^/y2;x) andPin(a;) = P3{T(^i_g-)^2+yIl^„2/y2;x), 
as defined by (1.20) and (2.5), the computation of Pa{ri;x) for general a and 
rj is described by [38, Theorem 2.5]; for normal t], similar considerations were 
given already in [36, page 363]. The following proposition is essentially a special 
case of [38, Theorem 2.5]. 

Proposition 3.2. Take any real a > 1 and let r] be any real-valued r.v. such 
that Ery" < oo. Then there exists Et] G [—00,00). Let 

x^ := supsupp(r?) and a;** := sup (supp(7?) \ {a;*}), 

where supp(r/) denotes, as usual, the topological support of the distribution of 
the r.v. 77; note that a;** = a;* unless a:* is an isolated point o/supp(77); in most 
applications, x* = 00 and hence a;** = 00). For all t G (—00, a:*), let 



m{t) := m„,^(i) := t + ^^l' = J;J ; (3.4) 



E(r,-i)r' Hv-tn 

let also m{x^) := x^. Then 

(i) the function m is continuous on (—00, a;*), left- continuous at x^, 
strictly increasing on {—oo,x„), from. E77 to a;*; also, m{t) = a;* for all 

t G [a;**, a;*]. 

(ii) for every x G (E77,a:,) there exists a unique tx = tx-a,ri G (— oo,a;») such 
that 

m{tx) = x; (3.5) 

in fact, tx G (— oo,a:); 
(Hi) for every x G (Ery, x») 

, , Eirj-tx)! E"(tj -tx)T^ , , 

(iv) (a) if x G {—QO,Erj\ then Pa{ri;x) — 1; 

(b) if X G [a;*, 00) then Pa{ri\ x) = P(?y = x) = P(?7 ^ a;); 
it is therefore natural to extend Pa{ri;x) to all x G [—00,00] by letting 
Pa{r]; —00) := 1 and Pa{ri; 00) := — as will henceforth be assumed; 
(v) Pa(j];x) strictly and continuously decreases from 1 to P(r] = a;*) = P(ry ^ 
a;*) as X increases from Er] to a;,; more specifically, 

(a) the function x 1— > Pa{ri;x) is strictly decreasing on {Er],x^); 

(b) it is also continuous on (E77,.t*), right- continuous at Erj, and left- 
continuous at a;* ; hence, it is in fact strictly decreasing on the entire 
closed interval [E?7,a;*]; 
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(vi) for any a e M and b > 0, one has 

for all X e {Er],x»); 
Pa{a + br]; x) = Paim ^) for all xeR. 

(Concerning the case a € (0, 1], see [38, Remark 2.6].) 

The following example illustrates Proposition 3.2, and also Proposition 3.5 
(to be presented later, in Subsubsection 3.2.1). 

Example. Take any real a > 1. Let 77 be a zero- mean r.v. taking on only 
two values, —a and b, where a and b are arbitrary positive real numbers. Then 
X, = b, a;*, = —a, and, using (say) the first expression for Paivi x) in (3.6), one 
can see that 



Paimx) 



(6 + a)"-i&a 



[(6(x + a)«)°-^ + (a(6-x)")"-^] 



for all X e [0, b]: also, -?«(??; x) = 1 for all x e [—00, 0] and Pa(??; x) = I{x 
b} = P(ry > x) for all x e [b, 00]. 




Here the picture on the left shows the graph {{t,m{t)): — 5 < i < x*} for 
a = 1, b — 3 and a = 1.2, while the picture on the right shows the graphs 
{{x,Pa{r);x)): - 2 < X < + 3} (the thick line) and {{x, P{r] ^ x)): - 2 < 
a; < a;* + 3} (thick-dotted over the thin line), also for a = 1, 6 = 3 and a = 1.2. 
A gap is seen in the graph {{t,m{t)) : — 5<f<a;*}ina left neighborhood 
oi t = —1, which is caused (despite making, with Mathematica, 15 recursive 
subdivisions with 1000 initial sample points) by a very steep increase of the 
function m in such a neighborhood; for instance, m(— 1.000001) is only 2.498 . . ., 
while m(— 1) — 3; yet, according to Proposition 3.2(i), there is no discontinuity 
there. The picture on the right also shows (see definition (3.11) and relation 
(3.13) below) the graph {{x, PooiV: x)) ■ — 2 < a; < + 3} (the thinner line) of 
the best exponential bound 

lim Paimx) 

a— ^00 

^x + a^j-frt ^6 — a; j-^rl 

for all x G [0,6), also with PooiVtx) = 1 for all x G [— oo,0] and Pooiv^x) = 
j^I{x = 6} = P{r] > x) for all x G [6,00]. While, in this case, one may not 



Poo{r];x)= inf e-^^Ee^'' = 



imsart ver. 2005/05/19 file: arxiv.tex date: February 24, 2009 



losif Pinelis/On the Bennett-Hoeffding bound 



13 



be greatly impressed with the overall degree of closeness of the upper bound 

Pa{ri;x) to P(?7 ^ x), note that in the "large-deviation" zone x ^ b the perfor- 
mance of the bound Pa{r]\ x)) is perfect: Pa{r]; x)) = P{r] ^ x) for all x ^ b, just 
in accordance with Proposition 3.2(iv)(b). 

In particular, Proposition 3.2 shows that the computation of the upper bound 

Paiv'i is based on that of the positive-part moments £(77 — t)" and E{r] — t)'^~^ . 
For a € {1, 2, 3} and a number of common families of distributions of 77, includ- 
ing the Poisson one, this computation was detailed in [6]. In particular, see 
formula [6, (10.5)] for P2{rj;x) with a centered Poisson r.v. 77. That formula is 
relatively simple, since, for a natural a and a r.v. r] with (say) a lattice distri- 
bution, the generalized moment E(r/ — t)" can be computed "locally"; indeed, 
if • • • < dk < rffe+i < • • • arc the atoms of the distribution of 77, then for any 
t G [dk,dk+i) one a.s. has r] > t iS r] > dk] thus, for such t, £(77 — t)" can be easily 
expressed in terms of the truncated moments E{r]—dk)\ with j — 0, . . . ,a. These 
comments provide a simple way to compute the bound Be(a;) = P2 (2/^0.2 /y2;x). 

As for the bound Pin(x) = P3(r(i_£)cr2 + y^sa^ /j/^; x), here there is no such 
nice localization property as the one mentioned in the previous paragraph, since 
the distribution of the r.v. r(i_£)o.2 +yll^^2 j^i is not discrete. It appears that the 
computation of the positive-part moments £(77 — t)" for 77 = V(x-e)a'^ ^V^ea'^ jy^ 
can be done most effectively via formulas expressing such moments in terms 
of the Fourier or Fourier-Laplace transform of the distribution of r/; see [49], 
where such formulas were developed (with this specific motivation in mind). A 
reason for this approach to work is that the Fourier-Laplace transform of the 
distribution of the r.v. r(i_g)^2 -|- y^l^^ijyi has a simple expression (cf. (1.17) 



where -p G (0,oo), s G (0,00), F is the Gamma function, $He denotes the real 
part of a complex number, i is the imaginary unit, j = —1, 0, ...,£,£ := \p—l\, 
ej{u) := e" — J2m=o ^^'^ ^ ^^"^^ ^ I^P"*" < ^"^^ Ee^^ < 



where k := and X is any r.v. such that E \X\p < 00. Of course, formulas (3.7) 
and (3.8) will be applied here to r.v.'s of the form X = T(^i_g)cr2 + yU-ea^ /j/^ — w, 
with w e M. 

3. 2. Comparison 

In this subsection, we shall compare the bounds BH(a;), PU(x), Be(a;), and 
Pin(a;), by means of identities and inequalities (in Subsubsection 3.2.1), asymp- 



and (1.5)). 

Namely, one has 




(3.7) 



00. Also, 




(3.8) 
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totic relations for large x > (in Subsubsection 3.2.2), and graphics and numer- 
ics for moderate a; > (in Subsubsection 3.2.3); we shall also include into these 

2 

comparisons the Cantelli bound ^■f_^^i and the best exponential upper bound 
exp{— ^} on the tail of the normal distribution N(0,(j^). 

3.2.1. Inequalities and identities 

Let us begin here with the following simple proposition concerning the bounds 
Pa{r];x) (as defined in (1.14)). Unless specified otherwise, let r] in this subsub- 
section stand for any r.v., and take any a G (0, oo). 

Proposition 3.3. For any x gM., 

Pc,{v;x) = inf{E/(77): f e H^, f{u) > I{u ^ x} € M} (3.9) 
= m{^^: feHl, f{x)>0}. (3.10) 

Now let us state general relations between the bounds Pairj-.x) for different 
values of a, as well as their relation with the best exponential upper bound 

P^(r?;ar) :=infi^. (3.11) 

Proposition 3.4. 

f]ni= nr^ = W+ P, (3.12) 

Q>0 

where Ti.'^ is defined as the class of all infinitely differentiable real functions f 
on M such that /(j) ^0 onR and f^^'>{~oo+) = for all j = 0,1,..., and 

:= {/: fix) = /(o_^,e*>(dO Vx € M}, 

where fi denotes a nonnegative Borel measure such that the integral J^q e*^/i( dt) 
is finite Vx G R; thus H'^^ may be viewed as a closed convex hull of the set of 

all increasing exponential functions. 

Using Proposition 3.4, one can obtain 

Proposition 3.5. 

(i) The function (0,oo] 9 a i-^- Pa{r];x) is nondecreasing. 
(ii) For allx^'R 

-Poo(??;a;) = lim P„(j7;x). (3.13) 
For completeness, let us also consider the Cantelli bound 

2 

Ca(x) := Ca^2(x) := -^—^ (3.14) 
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and the best exponential upper bound 

2 

EN(a;) := EN,2(a;) := Pcx> (r<,2 ; a;) = exp{ - ^} (3.15) 

on the tail of the normal distribution N(0,fT^); of course, in general EN(a;) is 
not an upper bound on P(S' ^ a;). 

The bound Ca(a;) can be presented in a form similar to (3.10) and (3.11): 

Proposition 3.6. Take any a G (0, oo), any r.v. 's ^ and tj such that ^ = 
Et] and E^^ ^ Erj^ = cr^, and any x G [0,oo). Then 

P(e>ar)<Ca(ar)= inf ^ (3.16) 

This proposition is essentially well known; yet, we shall provide a proof for 
the readers' convenience. 

Now we are ready to turn to relations between the four related bounds: 
BH(a:), PU(a;), Be(a;), and Pin(a;), as well as Ca(x) and EN(a;). 

Proposition 3.7. For all x > and all values of the parameters: a > 0, y > 0, 
and e G (0, 1), 

(I) Fm{x) PU(x) s; BH(.t) and Be{x) < Ca(a;) A BH(a;); 
(II) Bc{x) = Ca{x) for all x € [0,y]; 

(III) BH(a;) increases from EN(x) to 1 as y increases from to oo; 

(IV) there exists some Uy/„ G (0,oo) that depends only on the raiio y/a such 
that Ca{x) < BH(a;) if x £ {0,aUy/a) and Ca(a;) > BH(a;) if x G 
(cruy/cr, oo); moreover, Uy/^ increases from uq+ = 1.585 ... to oo as y/a 
increases from to oo; in particular, Ca(x) < EN(a;) if x/a G (0, 1.585) 
and Ca,{x) > EN{x) for x/a e (1.586, oo). 

(V) PU(x) increases from EN(a:;) to BH(x) as e increases from to 1. 

Proposition 3.8. For all a > 0, y > 0, e e (0, 1), and x > 

PV{x) =max{EN(i_^)^2((l-a)a;)BH^^2^j^(aa;): aG (0,1)} (3.17) 
= EN(i_,),2 ((1 - a,)x) BH,,2^^(a,x), (3.18) 

where ax is the only root in (0, 1) of the equation 

(l-a)a:^ x / axy\ ^ 



,„(l.=f)=0. (3.«) 



(1 - £)ct2 y 

Moreover, increases from e to 1 as x increases from to oo; in particular, 

G (e, 1) (3.20) 

for all a; > 0. 
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Expressions (3.17) and (3.18) provide a rather curious interpretation of the 
bound PU(.t) as the product of the best exponential upper bounds on the tails 
P (r(i_£)o.2 > (1 — a)x) and P [tl^^^ ^ ax) — for some a in (0, 1) (in fact, the 
a is in the interval (£,1))- In view of (1-17), this interpretation should not come 
as a big surprise. Proposition 3.8 will useful in the proof of Proposition 3.12. 

Proposition 3.9. For any f e W^, a > 0, y > 0, and s € (0, 1), 

E/(r(i_e)<.2 +yn,,2/^2) < E/(yn,2/^2). (3.21) 

So, by Proposition 3.9, of the two r.v.'s — ^{i-e)^^ + y^ea^ /y^ and t/Ho-s /y^ 
— with the same variance cr^, the former one (with a light-tail component 
r(i_g)^2) is in a certain sense smaller than the latter, purely heavy-tail one. 
This suggests that the upper bounds Pin(a;) and PU(a;), which are based on 
r(i_£)(^2 + yllg„2iy2, will tend to be smaller than the bound Be(a;), which is 
based on t/II<j2 . Such heuristics is to an extent justified by results of Subsub- 
sections 3.2.2 and 3.2.3, especially by Corollary 3.15 in Subsubsection 3.2.2 and 
the graphics for £ = 0.1 in Subsubsection 3.2.3. 

Proposition 3.10. (Recall Definition 1.3.) For the least concave majorant of 
the tail function of the Poisson distribution Pois((?) one has 

P'-^iHe ^ u) = P{Ue ^ jy+^-^ P{ne ^ j + 1)""^ < P{ne ^ j) 

for all 6 > {) and m G R, where j := ju := \u — 1\. 

Proposition 3.11. For all a > 0, y > 0, e € (0, 1), and x €R 

P (r(l-£)o-2 + y^scr^/y^ 

Jr 

(3.22) 

The term P^''{yll^„2 ^y2 ^ z) in (3.22) is to be evaluated or bounded according 
to Proposition 3.10, using at that Remark 1.4. 



3.2.2. Asymptotics for large deviations 

Here and in what follows, for any two expressions £j{x) = £j-^a,y,e{x) (with 
j = 1,2) the notation £i{x) ^ £2{x) will mean "fi(a;) < C£2{x) for some 
positive constant factor C not depending on x, for all large enough x > 0"; 
£2{x) ^ £i{x) will mean the same as £i{x) ^ £2(2^). Notation like £i{x) ~ £2(2;) 
will mean, as usual, that £i{x)/£2{x) 1. 

Proposition 3.12. For any fixed a > 0, y > 0, and e G (0, 1), 

PU(.) ^ Ce^fy BH(x) exp { [ In^ (l + ^) - 2 In (l + ^)] } 

= (£ + 0(1))^/" BH(a;) 
as x ^ 00, where C := exp{^ip{£ — 1)}. 
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Proposition 3.13. For any fixed a > and y > and (say) all x S [y, oo) 

^ P{yil„yy2 > x) < Be(a;) < C2,o P^{ytlayv^ > x) (3.23) 

<,xP{yn„2/y2 > a;) < a;BH(a;). (3.24) 

Proposition 3.13 implies that for x e [y, oo) the upper bounds BH(a;), Be(a;), 
and C2,o P^^iy^a'^ jy^ 5^ 3;) on P (5 5^ .t) as well as the particular, limit instance 
P (2/11(^2 /j^2 ^ x) of P(iS' ^ x) — are the same up to a power-function factor, of 
the form Cx^l'^ , where C = Ca,y > does not depend on x. 

Proposition 3.14. For any fixed a > 0, y > 0, and e G (0, 1), and (say) all 
X e [y, oo) 

^^7^ ^ PiVa,y,e > x) Pin{x) ^ €3,0 P'^iVa.y.e > x) (3.25) 

^ X PiVa,y,e ^ x) xP\J{x), (3.26) 

where 

Vcr,y,s ■■= r(i_e)<^2 + yfL^2/y2. (3.27) 

Proposition 3.14 implies that for x G [y, oo) the upper bounds PU(a;), Pin(a;), 
and C3,o P^''(r(i_e)^2 + yll„2^y2 ^ x) on P{S > x) as well as the particular, 
limit instance P^r^i-e)^^ + yil^^ /y^ ^ x) of P{S ^ x) — are the same up to 
a power-function factor, of the form Ca;^/^, where C = Ca,y,e > does not 
depend on x. 

Thus, Propositions 3.12, 3.13, and 3.14 imply that either of the bounds PU(a;) 
or Pin(a;) is better than both BH(a;) and Be(a;) by a factor which is decreasing 
exponentially fast in x, for large enough x > 0. More precisely, taking also into 
account the inequality Be(a;) ^ BH(a:) in Proposition 3.7(i), one immediately 
obtains 

Corollary 3.15. For any fixed a > 0, y > 0, and e S (0, 1), and all x'^Q 
Pin(x) ^ PU(a;) ^ {e + o(l))^/^ Be(a;) ^ {e + o(l))^/^ BH(a;) 



Of course, the asymptotically better bounds PU and Pin require information 
on the sum of truncated third moments, in addition to that on the sum of second 
moments. However, it is difficult to imagine a situation when only the latter (but 
not the former) kind of information is available. 

Proposition 3.16. For any fixed a > \ and a > 0, 

Pa{T^2;x) ~ Cafi PiJ^a^ ^ x) as x ^ OO. 

Thus, for a centered Gaussian r.v. 77, the optimal upper bound Pa(r];x) on 
the tail P(77 ^ x) differs from it approximately by a constant factor Ca,o G (1) 00) 
for large a; > 0. 
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If is a centered Poisson r.v. tig, then the asymptotic behavior of the ratio 
Pairj'.x)/ P(?7 ^ x) is starkly different: it oscillates between nearly 1 and a factor 
of the order of a; - as seen from the following proposition, which also shows that 
the factor x in (3.24) cannot be substantially improved. More precisely, one has 

Proposition 3.17. For any fixed a > 1 and 9 > 0, 



Pc^ilLe; k-9)r^ 9{tie ^ k - 6) - P{he > k - 9) 



(3.28) 



as Z 3 k ^ oo. 

To illustrate Proposition 3.17, here is the graph of 
with 9 = a'^ = 0.6 and y = 1, over x e [0, 7.4]: 

15 



Bcjx) 



-1 




One can expect the behavior of the ratio Pa{V: ^)/ ^iv ^ ^) for ij = T(^i_g^„2 + 
yU-a^ and large a; > to be intermediate between the two kinds described in 
Propositions 3.16 and 3.17. 



3.2.3. Numerics and graphics for moderate deviations 

In Subsubsection 3.2.2, it was shown that the bounds Pin(x) and PU(x) are 
much better than Be(.T) and BH(a:) for all large enough x > 0. For moderate 
deviations, the comparison is more complicated. Recall that the bound Be(x) = 
P2{yTl„2/y2;x) is based on the comparison inequahty (1.18) over the class H^. 
of generalized moment functions /, while the bound Pin(.T) = P3(r(i_£)o.2 + 
y'n.^„2/y2;x) is based on the comparison inequality (2.4) over the class .F^, and 
the latter comparison is essentially equivalent to that over the class , which 
is smaller than (by (1.11)). This is the factor that may make Be(a;) better 
than Pin(x) (and hence better than PU(a;)) if x is not so large; this factor 
will be especially significant when e is close to 1 and thus the role of the light- 
tail component T(^i^^y2 is negligible. However, as was noted in the Introduction 
concerning non- uniform Berry-Esseen type bounds, in typical applications when 
the Xj's do not differ too much in distribution from one another, e will be close 
to 0, rather than to 1. The interplay between these two factors — the presence 
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of a light-tail component vs. the larger class of generalized moment functions 
— is illustrated below. 




Here, for cr normalized to be 1, and for e G {0.1,0.9} and y G {0.1,1}, the 
graphs G{P) := {(a;,logjQ -^jj^) : < x ^ -^max} of the decimal logarithms 
of the ratios of the bounds P = Ca, PU, Bo, Pin to the benchmark Bennett- 
Hoeffding bound are shown, where x^ax equals either 3 or 4, depending on 
whether y = 0.1 (relatively little skcwcd-to-the-right summands Xi) or y = 1 
(relatively highly skcwed-to-the- right summands X^). The corresponding values 
of e, y, and BH(.Tinax) arc shown for each of the four pictures. Note that, for 
such choices of Xmax, the values of BH(xmax) are approximately the same (about 
2%), whether y = 0.1 or y = 1. 

The graphs G{P) for the bounds P = PU and P = Be arc shown by the dot- 
dashed and solid linos, respectively; the graph G(Ca) too is shown by a solid 
lino, but only on the interval (0, Uy), on which Ca < BH, that is, log^Q ^ < 
- SCO Proposition 3.7(IV). One can see, for y = 1, Ca(.T) is better than BH(.t) 
for all X G (0, 2.66). In accordance with Proposition 3.7(1,11), the graph G(Ca) 
hes above G(Be) except that the two graphs coincide on the interval [0,y], 
even though the graph G(Be) is seen to be very close G(Ca) well to the right 
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of the interval [0,y] = [0,0.1] for y = 0.1. For the bound Pin, actually two 
approximate graphs arc shown: the one given by the thick dashed line was 
produced using formula (3.7) (with s = ln(l + y)/y and j = — 1) and the one 
given by the thin solid line was produced using formula (3.8); one can see that 
the two lines look practically the same as they should. (However, no other 
accuracy control of the performance of the Mathematica numerical integration 
command MIntegrate used to evaluate the integrals in (3.7) and (3.8) was 
done.) In fact, the graph for Pin was obtained via a "parametric" setting, as the 
set of the form {(x, logj^Q : .x = m(i), i = u — 1/m, 0.1 ^ u ^ Umaxji where 

the function m is as in (3.4) and Mmax is the positive root u of the equation 
m{u—l/u) = Xmax; this way, one have to solve the equation m{t) = xint only 
for X — Xj^ax- 

These pictures confirm the thesis that, if the weight e of the heavy-tail Poisson 
component is relatively small, then the bound Pin(a;) is significantly better (i.e., 
smaller) than Be(.T) for (say) x 5^ 3. If e is relatively large, then Be(.T) may be 
slightly better than Pin(.T) for moderate .t > (say for a; < 4). Both Pin(.T) and 
Be(x) are significantly better than the Bennett-Hoeffding bound BH(a:;), even 
for moderate x > 0. The bound PU(.r) is close to BH(.t) for moderate x > 
if e is close to 1, which is in accordance with Proposition 3.7(V). On the other 
hand, if the weight e of the heavy-tail Poisson component is small while y is 
large enough so that the Poisson component is quite distinct from the Gaussian 
component, then PU(a;) is better than Be(a;) even for such rather small x as 
x = 2.5. Here it is with more detail: 

(i) If the weight of the Poisson component is small (e = 0.1) and the Poisson 
component is quite distinct from the Gaussian component {y = 1), then 
Be(a::) is about 9.93 times worse (i.e., greater) than Pin(a;) at a; = 4. More- 
over, for these values of e and y, even the bound PU(a;) is better than 
Be(a;) already at about x = 2.5. 

(ii) If the weight of the Poisson component is small (e = 0.1) and the Poisson 
component is close to the Gaussian component {y = 0.1), then Be(a;) is 
still about 20% greater than Pin(a;) at a; = 3. 

(iii) If the weight of the Poisson component is large (e = 0.9) and the Poisson 
component is quite distinct from the Gaussian component (y = I), then 
Be(x) is about 8% better than Pin(x) at a; = 4. For x e [0,4], Pin(a;) and 
Be(x) are close to each other and both are significantly better than either 
BH(x) or PU(x) (which latter are also close to each other). 

(iv) If the weight of the Poisson component is large (e = 0.9) and the Poisson 
component is close to the Gaussian component {y = 0.1), then Be(x) is 
about 12% better than Pin(.T) at x = 3. For x G [0,3], Pin(x) and Be(x) 
are close to each other and both are significantly better than either BH(a;) 
or PU(x) (which latter are very close to each other). 

In particular, we see that the latter two of the four enumerated cases are quite 
similar to each other. That is, if the weight of the Poisson component is large, 
then it does not matter much whether the Poisson component is close to the 
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Gaussian component. 

A summary of the comparisons made in this subsubscction and in the 
previous one is as follows. For all a; > 0, bounds Pin(a;) and Be(x) are respec- 
tively better than the corresponding exponential bounds PU(a;) and BH(a::). For 
large x > 0, each of the bounds Pin(a;) and PU(.x) is better than Be(x); the same 
may hold even for moderate a; > 0, especially when the weight e of the Poisson 
component vs. the weight 1 — e of the Gaussian one is relatively small; this is 
the case in typical applications. Otherwise, that is for relatively large e G (0, 1) 
and moderate a; > 0, bound Be(a;) may be a little better than Pin(a;) and sig- 
nificantly better than PU(x). (On comparisons of bound BH(x) with previously 
known to Bennett bounds that show that BH(x) is superior to them, see [1].) 
Overall, the upper bound Pin(a;) introduced in this paper usually outperforms 
the other three bounds: BH(x), PU(x), and Be(x). The minimum Pin(a;) ABe(a;) 
will in all cases be better (and usually significantly better) than PU(a;) ABH(a;). 

These relations are illustrated by the following diagram: 

BH — ^ PU 

i i 



111 particidar, it shows that PU is a refinement (denoted by r) of BH. This 
refinement is also an improvement, as is obviously the case with any refinement 
that is exact in its own terms; indeed, the more specific the terms, the better 
the best possible rcsiilt is; the usual downside of a refinement, though, is that 
it is more difficult to deal with: in terms of getting more specific information on 
the distributions of the X,'s, as well as proving and computing the bound. Also, 
PU may be considered as a generalization of BH, as BH may be considered as 
a special, limit case of PU, with e ^ 1. 

The relation of Pin with Be is almost parallel to that of PU with BH. How- 
ever, the refinement (and hence the improvement and generalization) here are 
only partial {pr), because, as discussed, the class H\ (corresponding to Pin) is 
a bit smaller than Ti"^ (corresponding to Be), even though, according to Propo- 
sitions 2.5 and 2.12, H\ is essentially the largest possible class for Pin, just as 
nl is for Be. 

The relations of Be to BH and Pin to PU are pure improvements (i), due to 
using the larger classes in place of the smaller class of exponential moment 
functions. 

4. Proofs 

In Subsection 4.1 of this section, we shall first state several lemmas; based on 
these lemmas, we shall provide the necessary proofs of results stated in Sections 2 
and 3. Proofs of the lemmas will be deferred to Subsection 4.2. We believe that 
such a structure will allow us to effectively present first the main ideas of the 
proofs and then the details. 
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4-1- Statements of lemmas, and proof s of theorems, corollaries, and 
propositions 

First here, let us state a few lemmas used in the proofs of Theorem 2.1 and 

Proposition 2.3. Wc shall need more notation. 

Let a and y be any (strictly) positive real numbers. For any pair of numbers 
(a, b) such that a > and 6 > 0, let Xa^b denote any r.v. such that ~ 
^^-a + zffe'^*' ^^'^ distribution of Xa,h is ^<5_„ + -^Sb, the imique 

zero-mean distrubution on the two-point set {—a,b}; here and in what follows 
Sx stands, as usual, for the (Dirac) distribution concentrated at point x. 

Lemma 4.1. For all x G {—oo,y], 

Lemma 4.2. Let X be any r.v. such that X ^ y a.s., EX ^ 0, and EX^ < ct^. 
Then 

(4-1) 



Lemma 4.3. For any 



0, ^ 



(4.2) 

there exists a unique pair {a,b) G (0,oo) x (0, oo) such that Xa,b ^ y a.s., 
EX^jj = , and E{Xa,b)\ = P; more specifically, b is the only positive root of 
equation 

(7H^ = p{b'^ + a^), (4.3) 

and 

" = T = 6^- 

In particular. Lemma 4.3 implies that inequality (4.1) is exact. 

For any given ?«GM,y>0,(T>0, and /3 > 0, consider now the problem 
of finding the exact upper bound of E{X — w)\ over all r.v.'s X satisfying 
the conditions X ^ y a.s., EX = Q, EX^ = a^, and EX^ = (3. At that, by 
Lemma 4.2, w.l.o.g. condition (4.2) holds, since otherwise the corresponding set 
of r.v.'s X is empty. 

Lemma 4.4. Fix any w € M., y > 0, a > 0, and j3 satisfying condition (4.2), 
and let (a, b) be the unique pair of numbers described in Lemma 4-3. Then 

sup{E(X -w)l:X^y a.s., EX = 0, EX^ = £7^ EX^ = /?} 

= max{E{X -w)l: X ^y a.s.,EX = 0,EX^ = a^,EXl = p} (4.5) 

= max{E{X -w)l: X ^y a.s., EX ^0,EX'^ ^ c^^EX^ ^ /3} (4.6) 
^(E{Xa,b-w)l if w^O, 
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where 

0v 

b:=y and a := -f-^ (4.8) 

(cf. (4.4) ). At that, a > 0, X- ^ < ?/ o.s., E X- g = 0, and E(X~ ^^)\ = (3, hut one 
can only say that EX? - ^ cr^, and the latter inequality is strict if (3 ^ y^+a^ ■ 

Together with Lemma 4.8 below, Lemma 4.4 represents one of the two most 
important steps in the proof of Theorem 2.1. 

Lemma 4.5. Let ^ and i] be any real-valued r.v. 's such that ^ Ejj, E^^ ^ 
E??2 < oo, and E/(0 £/(?]) for all f (zH^. Then 

(i) inequality E/(^) ^ ^fiv) will hold for all f G .F^; 

(a) if the condition E^ ^ Er] is replaced by E(, = Er], then the inequality 
E./(C) =^ E/(ry) will hold for all f in the larger class ^, defined in 
Proposition 2.2; 

(Hi) if the conditions E^ < Er/ and E^ ^ Erf are both replaced by the equali- 
ties — E-q and E^"^ ~ E??^, then the inequality E/(^) < ^fiv) hold 
for all f in the larger class J-'^ 

(iv) however, it is not enough to replace the condition E^^ ^ E?]^ &?/ the equal- 
ity E^^ = E?7^ for the inequality E/(^) < Ef{r]) to hold for all f in the 
larger class T\ ^ defined by removing f from the list "f,f',f",f""' in 
(2.1). 

Lemma 4.6. Let S, and rj be any real-valued r.v. 's such that E^ ^ Erj, E^^ ^ 
E??^ < oo, and Ef{^) < Ef{r]) for all f &n\. Then 

(i) inequality E/(^) ^ E /(r/) will hold for all f € JF^; 

(ii) if the condition E^ ^ Erj is replaced by E^ = Er], then the inequality 
E/(C) =^ E/(?7) will hold for all f in the larger class T\ ^, defined in 
Proposition 2.9; 

(Hi) if the conditions E£^^E-q and E^^ ^ Ejj^ are both replaced by the equali- 
ties E^ = Ery and ESf = Erf , then the inequality Ef{£) ^ Ef{r]) will hold 
for all f in the larger class T\ y^; 

(iv) however, it is not enough to replace the condition E^^ ^Erj^ by the equal- 
ity E^^ = Erf for the inequality E/(^) ^ Ef(r]) to hold for all f in the 
larger class T\ ^ defined by removing f from the list "/, /', /" " in (2.14). 

Lemma 4.7. Let ao,l3o,cr,l3 be any real numbers such that < ao < (t, < 
/3o ^ /3o ^ fgy, and l3 < a'^y. Then 

^fC^^a^.-M + y^ffo/y') < E/(r^2_^/y + 2/n^/j,3) (4.9) 

for all f e H"^, and hence for all f G T\ and for all f G 

Lemma 4.8. Let X be any r.v such that X ^ y a.s., EX ^0, EX"^ < cr^, and 
EX^ < /?, where (3 satisfies condition (4.2). Then for all f € T\ 

Ef{X) < Ef{T,._p/y+ytl0,y.). (4.10) 
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Lemma 4.9. (Recall here the definition of Xa^t in the beginning of Section 4-) 
Fix any a > 0, y > 0, and e E (0,1). Let then (3 :— scr'^y, in accordance 
with (2.2). Then for each large enough m € {1, 2, . . . } there exist positive real 
numbers a = and b = bm such that the following statement is true: 

if n := 2ni and Xi, . . . , X„ are independent r.v. 's such that Xi, . . . , X„i 
are independent copies o/ Xf,/^/^ f,^^/^ and X^+i, ■ ■ ■ j X2m are independent 
copies of Xa/m,y! then Xi, . . . ,X„ satisfy conditions (2.3), with equalities in 
place of the first three inequalities there. 

Moreover, then S = Xi-\ \-Xn converges in distribution to r(^i_^'j„2+yllg^2 j^i 

as m ^ CO. 

Proof of Theorem. 2.1. Let af := EXf, (3^ E{Xi)\, al := J2Li'^f^ /^o := 
Z^ILi A' ^T^-f^i/y + y^Pi/v^' and T := I]"=i^i- Then, by a standard 

argument (cf. e.g. the proof of [43, Theorem 2.1]) based on Lemma 4.8, one has 

Ef{S) < E/(T) = E/(r,2_^„/^ + yU0„/ys) for all / G J^l 

On the other hand, it is clear from (2.3) that ^ CTq ^ and ^ /3o ^ /?; next, 
Pi ^ (^iV for alH = 1, . . . ,n and hence /3o ^ cr^y; also, by (2.2), cr^ — = 



(1 — e)o-^ and = eu^ jy^. It remains to use Lemma 4.7. □ 

Proof of Proposition 2.2. This follows by Lemma 4.5(ii,iii). □ 

Proof of Proposition 2.3. This follows by Lemma 4.9 and the Fatou lemma for 
convergence in distribution sec e.g. [7, Theorem 5.3]. □ 



Proof of Proposition 2.5. To obtain a contradiction, suppose that for some p € 
(0, 3) one can replace n\ in Corollary 2.4 by W^. By (1.11), w.l.o.g. p e (2, 3). 
Take any a G (0, 1) and introduce the new variable 

a 

T := , 

y/l + a 

Next, take any n G {1,2,...} and let Xi = Xa,i and X2 = ■ ■ ■ = X„ = a.s. 
(recall the definition of Xa,b at the beginning of Section 4). Then conditions (2.2) 
and (2.3) hold for y = 1, cr = ^/a, and (3 = at that, e = {l-s)a^ = r^, 
£(T^ = and —£(7^ + a = T^. Note that the function x 1— > f-a{x) := {x + a)\ 
belongs to the class T-L\. Consider 

£1(0) :=E/_„(5) and £3(0) := E /_„(r(i_,),2 + n,,2), 

respectively the left-hand side and the right-hand side of inequality (2.4) with 
/ = /_„. Then 

£i{a) = E{Xa,i+a)l = E{Xa,i+af = a{l+af-^ = a+{p-l)a^+o{a''); (4.11) 

here and in the rest of the proof of Proposition 2.5, the limit relations are 
understood as a J, 0. 



imsart ver. 2005/05/19 file: arxiv.tex date: February 24, 2009 



losif Pinelis/On the Bennett-Hoeffding bound 25 
On the other hand, 

f2(a) =E(r^2+n„/(i+„)+r2)^ = f2,o(a)+f2,i(«)+£2,2(a)+f2,^3(a), (4.12) 



where 



^2,fc(a) := P(na/(i+a) = k) E(r^2 + + k)\ 



e 



-a/(l+a) 



OO 

^2,^3(0) := y^g2,fc(a), 



E(rZ + r^+fc)^ 



fc=3 

and Z is a Standard normal r.v. Note that (rZ + r^ + fc)^ = 0(rP|Z|P+T2p + A;P), 
whence 

< E{tZ + t^ + k)l = 0{tP + F) (4.13) 

over all k ^ 0. So, 

f2,o(a) = 0{tP) ^ 0{aP) = o{a^), (4.14) 
since p G (2, 3). Similarly using (4.13), it is easy to see that 

£2,^3(0) = 0{a^) = o{a^). (4.15) 

By dominated convergence, E{tZ + + 2)^ = 2^' + o(l). Hence, 



a2 



f2,2(a) = y2''(l + 0(1)) = 2P-^a' + o{a'). (4.16) 

To estimate £2,1(0), introduce /i(t, 2;) := E(l + tzR + t'^)\, where R := Xi^i is 
a Rademacher r.v. which is independent of Z. Then h{0,z) = 1, h'^{Q,z) = 0, 
\K{t,z)\ = 0{\z\P + 1), and so, E(tZ + + 1)^ = E/i(t,Z) = 1 + 0{t^) = 
1 + o(a), which implies 

£2,i(a) = e""/^'+"^ YT^^^ + "^""^^ =a-2a^ + o{a^). (4.17) 

Thus, (4.12), (4.14), (4.15), (4.16), and (4.17) yield £1(0) = a+{2P-^ - 2)a'^ + 
o(a^). So, recalling (4.11), one has £2(0) — £i(a) = ff(p)a^ + o(a^), where g{p) := 
2P~^ — 1—p. Observe that g{2) = — 1 < = ^(3) and 51 is a convex function, so 
that g{p) < for all p € (2,3). Therefore, the difference £2(0.) — Si{a) between 
the right-hand side of inequality (2.4) (with / = /_a) and its left-hand side is 
negative for small enough a £ (0, 1). This contradiction concludes the proof of 
Proposition 2.5. □ 



Proof of Corollary 2.7. Let S ■=J2i^i ^{Vi > '^^} and a := VS^i ^{Vi > 
and let a be defined as in (2.9). Just as was noted concerning condition (2.2), 

w.l.o.g. let us assume that e := € (0, 1). Then, by Theorem 2.1, 

Ef{x + S) ^ E/(x + r(i_g)^2 +t/n,52/^2) (4.18) 
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for all a; e ffi and / G ^F'l, since the class is obviously invariant with respect 
to the shifts J^l 3 f{-) ^ f{x + ■), hi all a; e M. On the other hand, by [46, 
Corollary 1 with p = ^ and (10)], 

Ef{S-S + z)^Ef{T,2_^2+z) 

for all 2; e M and / G J^^, where one may assume that the r.v. T^2_^2 is 
independent of the r.v.'s S, T(^i_g-f^2, and 11^5-2/^2 In (4.18). Using now (4.18) 
and the independence of S — S and S, for all / G J-"^ one has 

Ef{S)= [ Ef{S-S + z)P{SG dz) 

^ / Ef{T,2_^2+z)P{SG dz) 
Jm 

= Ef{T,2_^2+S) 

= [ Ef{x + S)P{T,2_^2 e dx) 

JR 

< / Ef{x + r(^l_g)^2+yflg^2/y2)P{T„2_^2£dx) 

= E/(r^2_a,2 +r(l_j)^2 +yllg^2/y2) 
= E/(r(l-£)a2 +yt[g„2/y2), 

since eo"^ = ea'^. Thus, inequality (2.10) is proved, which in turn implies in- 
equalities (2.11) and (2.12) (cf. Corollary 2.6). □ 

Proof of Proposition 2.8. As noted in the Introduction, inequality (2.16) for all 
/ of the form f{x) = {x — t)'^ was obtained by Bentkus [2, 4]. By the Pubini 
theorem, one has (2.16) for all f gH^. Then the extension to all f € J^l follows 
by Lemma 4.6(i). □ 

Proof of Proposition 2.9. This follows by Lemma 4.6 (ii, ill). □ 

Proof of Proposition 2.10. This proof is quite similar to (and even somewhat 
simpler than) that of Proposition 2.3. □ 

Proof of Proposition 2.12. This proof is somewhat similar to but much simpler 
than that of Proposition 2.5. To obtain a contradiction, suppose that for some 
p G (0, 2) one can replace in Corollary 2.11 by 1-L\. W.l.o.g. p G (1, 2). Take 
any n G {1,2,...} and let Xi = Xi^i and X2 = ■ ■■ = X„ = a.s. Then for 
y = a — 1 and all i one has EXi = 0, Xi ^ y a.s., and E Xf = a. The 

function x f-i{x) ■= {x + l)?j_ belongs to the class H^. Then the left-hand 
side and the right-hand side of inequality (2.16) with / = /_i are, respectively. 
Slip) — E(Xi,i + 1)^ = 2P-^ and £2(1?) := EHf. Observe that the function 
p £2{p)/£iij>) is strictly convex on the interval (1,2), and its values at the 
endpoints 1 and 2 of the interval are 1. It follows that £2{p)/£i{p) < 1 for all 
pe(i,2). □ 
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Proof of Proposition 3.1. Take indeed any cr > 0, j/ > 0, e G (0, 1), and x ^ 
0. Let for brevity / InPUcxp- Then, by the definition in (1.6), PU(a;) = 
expinf^^o ( — Aa; + /(A)). By the definition of InPUexp in (1-5), 

e^y — 1 

/'(A) = A(l - e)(j'^ + — - — e£7^ (4.19) 

which increases from to oo as A does so. Thus, there exists a unique root 
A = Aa; in [0, oo) of the equation /'(A) = x, and A^ is the unique minimum 
point for — Ax + /(A) over all A € [0, oo), so that (3.1) holds, with the so defined 
Aa;. It is also clear now that A^ = {f')~^{x) increases from to oo as a; does so; 
that is, the last sentence of Proposition 3.1 is verified. 

Next, rewrite the equation /'(A) = .x as e'^^ = ^ and then we^ = ne^^^^^'^, 
in terms of the new variable w := (1 + r)n — Xy, where r := and k :— 

so that A = il±Lhi:J£_ Now one sees that A^; defined above in this proof as the 
unique root of equation /'(A) = x also satisfies definition (3.3). 
By (1.5), 

e-^^ PUexp(A) = exp { - As + y (1 - e)a^ + "^^"'^'^^ ecr^}. (4.20) 

Now use again the mentioned equation e^^ = ^ to substitute ^ for e^^ in the 
expression (4.20) and then substitute there (1+'')'^-^ for A. Then (3.2) follows 

by simple algebra. □ 

Proof of Proposition 3.2. 

(i) The continuity of m on (— oo, x,) follows by the condition Ery" < oo and 
dominated convergence. 

The left continuity of m at x» follows by (3.4) and the definition rn(a;*) := x*. 
Indeed, in view of the first expression for m{t) in (3.4), 

m{t)>t for alH G (-oo,a;*), (4-21) 

whence m{t) — !■ oo = as t "f .t* in the case when .t* = oo. Now if a;* < oo then, 
in view of the last expression for m{t) in (3.4), — m{t) = — — ^^^.-i^ — G 

[0, — t] for all t G (— oo, a;*), so that m{t) ^ a;* as t t 2;* in this case as well. 

That m{t) = a;* for all t G [x„ , a;*] also follows in view of the last expression 
for m{t) in (3.4), taking also into account the definition of a;**, which implies 
that (?7 — t)+ = (a:* — t) I{r] = a;*} a.s. for ah t G [a;**, a;*]. 

That the function m is strictly increasing on (—00, a;**), with m((— 00)+) = 
Erj, follows immediately from parts (i) and (ii) of [38, Theorem 2.5]. This com- 
pletes the proof of part (i) of Proposition 3.2. 

(ii) Part (ii) of Proposition 3.2 follows immediately from its part (i), taking 
also into account (4.21). 

(iii) The first equality in part (iii) of Proposition 3.2 follows immediately 
from part (iv) of [38, Theorem 2.5]; the second equality follows by (3.5) and 



imsart ver. 2005/05/19 file: arxiv.tex date: February 24, 2009 



losif Pinelis/On the Bennett-Hoeffding bound 28 

(3.4). (The natural condition a; < a;, was missing in parts (iii) and (iv) of [38, 
Theorem 2.5]; thanks are due to Bentkus for having drawn my attention to that 
omission.) 

(iv) Part (iv)(a) of Proposition 3.2 follows from the last sentence of [38, 
Theorem 2.5] and Proposition 3.3, to be proved next. 

Let us now verify part (iv)(b). Take indeed any x G [a;*,oo). Then for all 
t e (— oo, x), by the already proved part (i) of Proposition 3.2, one has m{t) ^ 
m{x^) = a;, < a;, and so, by the second displayed formula on [38, page 302], 

is nonincreasing in t G (— oo,.t). Recalling now (1.14) and taking also into 
account that rj ^ x a.s. for all a; € [a;*, oo), one sees that 

Po,ir];x)= inf F{t,x) (4.23) 

tE{ — oo,x) 

(4.24) 

since < hm^ja; ^^'"^^^ < ^i^t^x E l{r] G {t,x)} = 0. This completes 

the proof of part (iv) of Proposition 3.2. 

(v) (a) Take any x and y such that Erj < x < y < x^. Then F{t, x) > F{t, y) 
for each t < {-oo,x). Hence, by (3.6), (4.22), and (4.23), Pa{mx) = F{tx,x) > 
F{tx,y) ^ Paivw)- This proves part (v)(a) of Proposition 3.2. 

(v)(b) By parts (i) and (ii) of Proposition 3.2, the function x is contin- 
uous on (Er^,a;*). Also, E{r] — t)"^ and E{r] — t)"~^ are continuous in f e M, by 
the condition E ry" < oo and dominated convergence. Hence, in view of the last 
expression in (3.6), the fimction x Pa{i]:x) is continuous on (Er/,.T*). 

Consider now the right continuity at Erj. Let a; | E77. Then, by parts (i) and 
(ii) of Proposition 3.2, tx — > — 00. IfEr; > —00 then, by the condition E r/" < 00, 
one has Ery € K, so that x — t.^ ^ Erj — ^ —tx- 

Let us show that the conclusion that x — tx ''^ —tx holds when Erj = —00. 
Note that ^ (1 + ??+)" for all t < —1; hence, by dominated convergence, 

E{r^-t)l^{-tr (4.25) 

and, similarly, E(r/ - t)^^'^ ~ ("O""^ as t -00. So, by (3.4), m{t) = t + 
(— i)(l + 0(1)) = o{\t\) as t ^ —00. It follows by part (ii) of Proposition 3.2 
that X = o{tx) (as x [ Er;), which indeed implies x — t^ ^ —t^, even in the case 
when Er] = —00. Therefore, by the first equality in (3.6), (4.25), and part (iv) 
of Proposition 3.2, 

Ejr, - txn HV - txn 1 p / p ^ 

^«(^;^) = 1^3^ - ^_txr " = ''^ 

as a; J, E 77, which concludes the proof of the right continuity at Er]. 
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To complete the proof of part (v)(b) of Proposition 3.2, it remains to verify 
the left continuity at a;*. The easier case here is when a;* = oo; then 

as X — > oo = x*, by the definitions (1.14) and (4.22) of Pa{ri;x) and F{t,x), 
the condition E ry" < oo and dominated convergence, and part (iv) of Proposi- 
tion 3.2. 

Assume now that a;* < oo. Let a; t a;*. Introduce ix ■= x — ^a;* — x, so that 
ix < X, ix '\ a;*, x^ — ix x — ix, and hence 



(^34y^ ^It^J EI{,e(t„a:.)} 



0, 



which in turn implies 



p/r X E{r]-tx)"I{ve{tx,x,]} ^ p / ^ 

F{tx,x) = f y— ^ '-^ P(77 = a;*) =Pa(r/;a;*), 

{x - txr 

by part (iv)(b) of Proposition 3.2. It is clear from the definition (1.14) of Pa{r]; x) 
that Pa{-q-.x) > Pa{r];y) whenever — oo < a; < y < oo. Hence, recalling again 
the definition (4.22) of F{t,x), one has 

Paimx*) < lim Pa{ri;x) < lim F{ix,x) = Pa{r];x*), 

X'\ X ite 3C Jit 

which implies the left continuity at x*. Thus, part (v)(b) of Proposition 3.2 is 

completely proved. 

(vi) Part (vi) of Proposition 3.2 follows immediately from the definitions 
(1.14), (3.5), and (3.4). 

□ 

Proof of Proposition 3.3. For brevity, let us denote the infima in (3.9) and (3.10) 
by infi and inf2, respectively. Take any a: G M. 

Then infg < Paiv^ x), because the function u i— > (m — t)°l_ is in for every 

t G R. If inf2 < Pa{ri;x), then there is some / G such that f{x) > and 

W < for all t G (-^,,T); but, by (1.10), /(u) = /(_^ „)(u-t)Xdi) 

for some nonnegative measure /x and all u G M; at that, /Lt((— oo, a;)) ^ 0, since 
f{x) > 0; so, 

J\X) J{-oo,x) J {-oo,x) 

by the Fubini theorem. This contradiction shows that inf2 = Pa{ri;x). 

It remains to show that infi = inf2. Take any / G such that f{x) > 0, 
and lot g := gf := j^- Then = E.g(?7), g G 7i", .g is nonnegative and 

nondecreasing, and g{x) = 1. It follows that g{u) ^ l{u > x} for all w G M. 
Thus, infi ^ inf2. 
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Vice versa, take any / € such that f{u) ^ l{u > x} for all u e K. Then 
f{x) > 1, and so, / > = 5 e and ^(a;) = 1. Hence, Ef{r]) > £5(77) = 

^4^, which implies infi > inf2. So, indeed infi = inf2. □ 

Proof of Proposition 3.4- The first equality in (3.12) follows easily from (1.11) 
and Proposition 1.1; indeed, for any convex f : R ^ R such that /(— 00+) = 
one has / ^ and /' ^ on M. As for the second equality in (3.12), it follows by 
the Bernstein theorem on completely monotone functions (see, e.g., [15] or [33]) 
and the fact that the Laplace transform of a measure uniquely characterizes the 
measure — cf. Remark 3.5 in [45] . Indeed, take any / G . Then for each w G 
[0,00) the function (— oo,0) B x 1-^ fw{x) = f{x + w) is completely monotone, 
in the sense that fw^ ^ on K for all j = 0, 1, . . . ; hence, there exists a unique 
nonnegative Borel measure on [0, 00) such that f{x + w) = /jq e*^yK,i,( dt) 
for all X e (—00, 0) or, equivalently, 

f{u)= I e*^e-''"fi^{dt) (4.26) 

J[0,oo) 

for all u G {—oo,w), and hence for all u € (— oo,0) (see e.g. [33, Ch. 2, §2]); 
in fact, one must have /i({0}) = 0, since /(— 00+) ~ 0. In particular, identity 
(4.26) holds for all u G (— oo,0) with /xo(dt) in place of e^^^ ijyj{dt). By the 
uniqueness of the measure, one has f{u) = J^^ e*"yUo( dt) for all w € [0, 00) 
and all u E (— oc, u;), and hence for all u G K. By dominated convergence, now 
one also obtains the condition /(^^(— 00+) = for all j = 0,1, . . . . □ 

Proof of Proposition 3.5. Similarly to (3.9)-(3.10), one has 

Po.{mx)= inf (4.27) 
/ew«p f{x) 

for all X G R. Indeed, by definition (3.11) and because the class 'ht_^^ contains 
all increasing exponential functions, the right-hand side of (4.27) is no greater 
than its left-hand side, PaoiWiX)- To complete the proof of inequality (4.27), 
take any / G 7f^^ and any x eR. Then f{u) = /^^ ^^e*"/i(dt) for all u &R, 
where ^ is some Borel measure. So, by the Fubini theorem and (3.11), 

Ef{r,)= [ Ee*Mdt)> / Poo (??; a;)e*X di) = P„o(»7;a;)/(a;), (4.28) 

J(0,oo) J(0,oo) 

which shows that the right-hand side of (4.27) is no less than its left-hand side, 
-foo(??; x). Thus, (4.27) is verified. On the other hand, by Proposition 3.4, (1.11), 
and (3.9)-(3.10), 

inf !/M=lim inf = \un P„{7j;x). (4.29) 

f{x) aToo/eW^ f{x) aToo " ' 

This, together with (4.27), yields (3.13). 
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That the function a i—^ Pa{r];x) is nondecreasing on (0,oo) follows immedi- 
ately by (3.9)-(3.10) and (1-11); that this function is nondecreasing on (0,oo] 
now follows by (3.13). □ 

Proof of Proposition 3.6. Take indeed any a G (0, cx)), any x G [0, oo), and any 
r.v.'s £, and ?? such that = Er/ and E^^ ^ Ery^ = cj^. Let f{t) := 

Then for any i < one has {x - t)"^ ^ x) < E{^-tf = E^'^ + 2\t\ E^ + i^ ^ 
(j2 +t^, whence P(^ > x) ^ mit<of {t) = mi {- oo, x) f{t) = fi-f^^/x) = 
Ca(x). □ 

Proof of Proposition 3. 7. 

(I) Inequalities Pin(a;) ^ PU(a;) and Be(a;) ^ BH(a;) follow because for each 

a > the class contains the class of all increasing exponential functions, 
taking at that into account the expression (3.10) for Pa{ri;x), the definitions 
of Pin(a:;), PU(a;), and Be{x) in (2.5), (1.6), and (1.20), the expressions for 
BH(x), BHcxp(A), and PUoxp(A) in (1.4), (1.16), and (1.17). As for the inequality 
PU(x) ^ BH(x), it follows, as discussed in the Introduction after (1.6), because 
PUexp ^ BHoxp. Inequality Be(a;) < C&{x) follows by (1.20) and the expressions 
(1.14) and (3.16) for PaiV: and Ca(a;), because obviously E{r] — t)'^ < E{r] — t)^ 
for any rj and t. 

(II) The identity in part (II) of Proposition 3.7 follows by [6, (10.5)] and (3.14). 

(III) Applying twice the special I'Hospital-typc rule for monotonicity (as well 
as the I'Hospital rule for limits), one sees that the ratio decreases from i 
to as M increases from to oo, where the function tp is defined by (1.2). Now 
part (III) of Proposition 3.7 follows. 

(IV) By re-scaling, w.l.o.g. a = 1. Consider the function 



d{x) := ^ - = 1 + - exp (4.30) 

Ca(.T) BII(.t) y' 



Let d:i{x) := d'" (x)j/ /e'f'^"j'>/y'' and d^ix) := 4(x)(l + xy)/(-3?y). Then di{x) 
is a monic quadratic polynomial in ln(l + xy) with (coefficients being rational 
functions of x and y, and) a negative discriminant, so that d4{x) > and hence 
d'^{x) < for all x ^ 0. So, ds decreases on [0, oo) from ^3(0) = y** > to 
^3(00—) = —00 < 0. Hence, d'" is H — on [0,oo); that is, d"'{x) switches in sign 
from + to — as X increases from to 00. Thus, d" is up-down on [0, 00); that is, 
switches from increase to decrease on [0, 00). Since c?"(0) = 1 > and d"(oo— ) = 
—00, one sees that d" is H — on [0, 00). Since d'{Q) — and (i'(oo— ) = —00, one 
sees that d' is H — on (0, 00). Since d(0) = and d{oo—) = —00, one sees that d 
is H — on (0, 00) as well. This proves the existence of a unique Uy in (0, 00) such 
that Ca(x) < BH(a;) for x G (0,%) and Ca(x) > BII(x) for x S {uy,oo). That 
Uy increases from = 1.585 ... to 00 as y increases from to 00 now follows 
by part (III) of Proposition 3.7, since G&{x) docs not depend on y. 

(V) Take any x > 0. By (1.5), PUexp(A) = exp ct^ ^ "'"'''^r^''^'^" ea^] 



strictly increases from exp { ^ 0"^ } = E exp{ \T^2 } to exp ^ ' \ = BHexp (A) 
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as £ increases from to 1. So, in view of (1.6), PU(a;) is nondecreasing in 
ee (0,1). 

Moreover, PU(a;) is strictly increasing in £ G (0, 1), from EN(x) to BH(a;); this 
follows by (3.1). Indeed, Ax is the only positive root of the equation /'(A) = x, 
where (see (4.19)) /'(A) = Act^ + ^^""J"'^^ scr^ is strictly increasing in A > and 
in £ G [0, 1]. So, the unique root of the equation /'(A) = x is decreasing in 
£ e [0, 1], from ^ < oo to i \n{l + xy/a^) > 0, so that A^ > remains bounded 
away from and oo as £ increases from to 1. 

Now part (V) of Proposition 3.7 follows, and the entire Proposition 3.7 is 
proved. □ 

Proof of Proposition 3.8. In view of (4.19), for / := InPUexp and each a; > 
the equation f'{Xx) = x implies that there is some € (0, 1) such that 

Xx(l — s)a^ = (1 — ax)x and ea^ = axX, (4-31) 

y 

whence 

Ax = Ax,i := 7^ and A^ = Ax,2 := - In 1 H — . (4.32) 

(1 — e)a^ y V £cr^ / 

On the other hand, introducing /i(A) := — A(l — ci!x)a;+^ (1 — £)cr^ and /2(A) := 
-Aaxa; + "=^""2"^^ eo-^, by (3.1) and (4.20) one has 

lnPU(a;) = -XxX + f{Xx) = fi{Xx) + f2{Xx) = /i(Ax,i) + /2(Ax,2) = g{ax), 
where 



^^"^ ■= - 2(1 - ^^(S^) = (EN(i-e).= ((l - a)x)Bll,^.^y{ax) 

(4.33) 

and V is defined by (1.2); thus, the expression in (3.18) equals PU(a;). The 
derivative 

(1 - a)x^ 



, , . I i — ajx X , ( ^ axy\ 

9((^) = 7-^ fr--!^ 1 + ^ 4.34 

(1 — £)a^ y \ £cr^ / 

decreases from g'{0) > to gi'(l) < as a increases from to 1. Hence, there 
is a unique maximum point of g in [0, 1] (say ax). Moreover, a = Ux must be 
the unique root in (0, 1) of the equation g'{a) = 0, which is the same as (3.19). 
But, by (4.34) and (4.32), this equation is satisfied hy a — Ux G (0,1). Thus, 
Oix = ctx- This completes the proof of (3.17), (3.18), and (3.19). 

Finally, it follows from (4.31) that = jz^^^^, which increases in A^ 

from to 00 as A^ increases from to 00, which it does (according to the 
last sentence in Proposition 3.1) as x increases from to 00. Now the entire 
proposition is proved. □ 

Proof of Proposition 3.9. This follows immediately from Lemma 4.7 with ctq = 
(7, /Jo = sc^y, and /? = a'^y. □ 
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Proof of Proposition 3.10. This follows because of the known fact that the func- 
tion Z 3 j P(n0 ^ j) is log-concave (see e.g. [39, Theorem 1 and Remark 13]), 

in the sense of [39, Definition 1]. □ 

Proof of Proposition 3.11. The right-hand side of (3.22) with P^''{yflg„2 jyi ^ z) 
replaced by jyi > z) would equal V{^(x-e)u'^ +y^e(T^/v'^ ^ 2;). So, since 

P^(j/n£<j2/y2 > z) majorizes P{yt\.srj2 /y2 > z), the right-hand side of (3.22) 
majorizes P(r(i_g)o-2 +yllea^/y^ ^ Also, the right-hand side of (3.22) is 
log-concave in a; G R by the well-known theorem, which states that f{x, z) Az 
is log-concave in a; e R if a function 3 (x, z) 1— > f{x, z) is log-concave 
(see e.g. [53] as well as the corresponding review by Perlman in Mathematical 
Reviews); here we also used the obvious fact that any normal density function 
is log-concave. This concludes the proof of Proposition 3.11. □ 

Proof of Proposition 3.12. All the limit relations in this proof are of course as 

X 00, unless specified otherwise. By Proposition 3.8, 1. Equations 

(4.32) allow one to qualify the rate of convergence of a^- to 1. Indeed, 

In 1 -I TT ~ In 5- = In — 5- -h Ina^; ~ In IH 5- , 

whence, by (4.32), 

1 - a. = m (1 + ^) ^ m (1 + ^) . (4.35) 

Now one can see that 

ln(l + ^)-ln(l + ^)=ln(l + i^^^-ll?)~a.-l 

(1-£)C72 



In 



xy \ so 



and so, again by (4.32), 



A. = - In f 1 + ^ 1 f 1 - ^^-1^(1 + 0(1))) ; 

y V Efj^ / \ xy / 

^2 



1 - a. = in (1 + ^) (1 - ^i^(l + 0(1))) ; 

xy \ ea^ J \ xy J 



2(l-£)c72 2t/2 V £0-2 A xy 



2y2 



■In^ 



(l + ^)+o(l). (4.36) 



Next, with the same i){u) = (1 -|- u) ln(l -|- u) — u as in (1.2), one has i>'{u) = 
ln(l -I- u) and V'"(w) = for u > 0, whence ip{u) — ip{v) = {u — v) ln(l + v) + 
0{{u — vY' /u) as V > u ^ 00. Hence and view of (4.35), 
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Now (3.18), (4.33), (4.36), and (4.37) yield 

,2 , , 



Next, 



ea"^ f xy\ ea"^ + xy r 1 / xy\ 



y2 vecr^y y 



,,2 ■ 



Using this together with ln(£ + fl) = ln(l + fl) - ^^-^^(1 + o(l)), (4.38), and 
the definition of BH (.r) in (1.1), one conchides the proof of Proposition 3.12. □ 

Proof of Proposition 3.13. Fix indeed any cr > and y > 0. By rescaling, 
w.l.o.g. y = 1. For brevity, let 

/, 2 1 xy X 

:= (T and v := — = — . 

a'' 9 

Letting also k := \x + 6] = \e{l + v)~\ (so that 9{l + v) k < 0{1 +v) + l) and 
using the Stirling formula, one has the following for a; > 0: 

P{ytL^2/y2 ^x) = P{Tle ^ x + 9) ^ P(ne =k)= e-'^9^/k\ 
^ — - exp{6'(l + w) [l-ln(l+z; + l/6')]} 



^3/2 

exp {e{v -{1 + v) ln(l + v))} = 



which proves the first inequality in (3.23). 

The second inequality in (3.23) follows by the first inequality in (1.20), since 

n^i is the limit in distribution as n ^ oo of 5* = Xi -| l-X„, where the Xj's are 

i.i.d. r.v.'s each with the centered Bernoulli distribution with parameter Ojn. 

The second inequality in (3.24) follows similarly by the Bennett-Hoeffding 
inequality (1.1). 

The third inequality in (3.23) is the second inequality in (1.20). 

It remains to verify the first inequality in (3.24). Let, for brevity, G(ii) := 
P(n0 ^ u) = P(ne ^ u - 6*) = P{yU^2/y2 ^u-9). Then, by Proposition 3.10 
(with j = \u — 1]) and Remark 1.4, for a; := u — ^ one has 

P'-^{yU^2/y2 ^x) = P'-^iHe ^ u) 

< G{j) = G(«)^g^ ^ 3 G{u) < u P(n, ^ u) 

= {x + 6)P{ytl„2/y2 ^x) 

if u > 1, since f e-^ < G{j) = YZ=j ^e"' < f ^"^^ for all j = 1, 2, . . . 
such that j > 6. This concludes the proof of Proposition 3.13. □ 
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Proof of Proposition 3.14- The second inequality in (3.25) follows by inequal- 
ity (2.5) and Lemma 4.9. 

The second inequality in (3.26) follows similarly by inequality (1.6) and 
Lemma 4.9. 

The third inequality in (3.25) is inequality (2.6). 

It remains to prove the first inequality in (3.25) and the first inequality in 
(3.26). W.l.o.g. y = l. 

To prove the first inequality in (3.25), use identity (3.18), the first inequality 
in (3.23), and the Laplace method for the asymptotics of integrals, as follows. 
By (3.27) (with y = 1), (3.20), the first inequality in (3.23), (3.17), and (3.18), 

P{r]a,i,e >x)^ ( P(n,,2 ^ z) ?{x - r(i_,),2 e dz) (4.39) 

J OLxX 

r BR,,.^i{z) r {x-z)^x. 

L^J7^^^n"2(r^}^^ 

> r e'^(^)-''("^^) d^ (4.40) 

J axX 

for all large enough a; > 0, where 

h{z) := K{z) :=ln(BH,,2,i(^)EN(i_,y2(a;-0)) =-^|^^|j^-ea2V'(^), 

recalling also (1.1). By (3.18) (or because is a root of equation (3.19)), one 
has h'{axx) = 0. Also, for all large enough a; > and all z G [axX,x] one 
has h"{z) = - " ^ - (i-£)<7^ e^^^^*^ ^^"^^S (^-20) again) and hence 

h{z) — h{axx) ^ — *'^]^" ■ Note also that, by (4.35), x — UxX — > oo as a; — > oo. 
Now the first inequality in (3.25) follows by (4.39)-(4.40). 

Finally, to prove the first inequality in (3.26), use (3.27) and (3.22). Let I\, 
I2, and Is be the integrals of the integrand in (3.22) (with y = 1) over the 
intervals (—00, 1], (1, 2a;], and (2a;, 00), respectively. Then, in view of the trivial 
bound P^illea^ > z)^l for all zgR, 



[ P{x - r(i_,)^2 G dz) = P(r(i_,)^2 > x - 1) 

J — 00 



e-" PU(a;) < e-^x^/^ p^^^^^^^ ^ ^-^ < p(^^_^_^ ^ ^) (4 4^) 

for all large enough x > 0; the first inequality in the line (4.41) is a limit case of 
the inequality in (1.6) (cf. Lemma 4.9), while the second inequality in the line 
(4.41) is the first inequality in (3.25), proved in the previous paragraph. Quite 
similarly, 



/•oo 

/ P(a; - r(i_^)<,2 e dz) 

J2x 



P(r(i_e)<^2 < -a;) = P(r(i_£)^2 > a;) ^ a; P{r]a,v,s > x) 
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for all large enough x > 0. Finally, by the first inequality in (3.24), for a; > 1 

z P(ne<.2 > z) P{x - r(i_,)<,2 e d^) 

^2X [ P(n,,2 ^ Z) P{X - r(i_,),2 G d^) = 2X P{Tla,y,e > x). 

JR 

Thus, the first inequality in (3.26) follows from (3.22) and the above bounds on 
Ii, I3, and l2- This concludes the proof of Proposition 3.14. □ 

Proof of Proposition 3.16. This follows from the more general [38, Theorem 4.2] 
(or, rather, from its proof) and Proposition 3.3. □ 

Proof of Proposition 3.17. Fix indeed any a > 1 and ^ > 0. By Proposition 3.2(vi), 
(3.28) is equivalent to 

P„(ne; k) ~ P(ne >k)^^ P{ne > k) (4.42) 

as Z 9 fc ^ oo. To begin proving this, take any A: = 0, 1, ... . Then, by (3.4), 

m{k) = m„,n,(A;) = k + >k + l = m{tk+i); (4.43) 

E{Ile - k)1 

the last equality here holds by (3.5), while the inequality in (4.43) follows be- 
cause (Ilg — fc)" ^ (He — A:)"~^ a.s., and the latter inequality is a.s. strict on the 
event {II^i ^ k + 2}, which is of nonzero probability. So, by Proposition 3.2(i), 

tk+i < k. (4.44) 

The other key observation about tk+i is that it is close enough to k for large 
k. To see that, let S € (0, 1) and k £ Z he varying so that 5 i and k ^ oo. 
Then 

E{U0 -k + 5)l = ak + Sk, 
where aj := aj^k '■= {j — k + d)°'j^e~^ and Sh := JZ'jLk+i Note that 

a, V j-k + S ) j + l^\l + s) j + 
for J > A; + 1. Hence, 

gk+l nk+l 

Sfc - afe+i = (1 + 6r—-—e-' 



{k + iy. {k + iy. ' 

and so, 

E{Ue -k + S)l^ak + a,+i ~ ^e"^ (j« + ^^^) • (4.45) 

Similarly, 



E(n, - k + Sn-' - ^^-'(^"-^ + ^^)- (4-46) 
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Now let US choose an arbirary constant c > and then specify 6 by the formula 

so that 6" = o(i) and 6°'-'^ = f . Then, using (4.45) and (4.46), one has the 
following for large enough k (cf. (4.43)): 

m{k~S) = k-5+ ^^(1 + o(l)) <k + l, 

whence tk+i > k — S — k — (^^) . RccalUng now (4.44) and that c > was 

arbitrary, one concludes that indeed i^+i is close to k, in the sense that 

fc-o(fc-i/("-i)) <tk+i < k. 

Revisiting (4.45) and (4.46) with S := k - t^+i = o(fc~^/("~^)) , one has 

ak+l nk+l 

E(n,-tfe+0+~ (^r^e-^ and E(n, - t,+i)r ' ~ (^^^'^ 
So, recalling the last expression in (3.6), one concludes that 

nk+l 

P„(n,;fc + l)~^^^e-^ (4.47) 
On the other hand, it is easy to see (cf. the first relation in (4.45)) that 

nk+l nk+2 

P(n^^^ + ^)~(fcTI)!^" ^""^ P^n^>^ + ^)-(fcT^^" 

Now (4.42) follows by (4.47). □ 
4-2. Proofs of the lemmas 

Proof of Lemma 4-1 ■ This follows because y^^^ is nondecreasing in a; e 
[0, y] from to ^^+^rj^ = j^^f^- □ 
Proof of Lemma ^.2. This follows by Lemma 4.1: 

□ 

Proof of Lemma 4-3. Take any /3 satisfying condition (4.2). Let f{x) := a^x^l"^ — 
(5(x + (72). Then /(O) = -/Sa^ < 0, /(y^) = a^yS _ /J(y2 + ^2) ^ by (4.2), 
and the function / is convex on [0, 00). Hence, / has exactly one positive root. 
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say a;*, and at that a;* e {0,y^]- Let b := xl^^, so that b G (0, y] and b is the 
only positive root of equation (4.3). Letting now a := (t"^ /b, one has Xa,b ^ y 
a.s., EX,,6 = 0, EXl, = ab = a\ and £{X^.f,)\ = ^ = ^ = by (4.3). 
It also follows that a = -^zr^- Finally, the uniqueness of the pair (a, 6) follows 
from the uniqueness of the positive root b of equation (4.3). □ 

Proof of Lemma 4-4- Let X be any r.v. such that X a.s., EX < 0, EX^ < 
cr^, and EX^ ^ /?. Let us consider separately the following possible cases: 
w ^ —a, —a ^w^O, and w ^ 0. 
Case 1: w < —a. Then 

fi{x) := Ao + Aix + Aix^ + A^x\ ^ (a; - w)\ 



for all a; e M, where 



2a% , 
3a + 

Ai :=: 



^2 := -3 

A3 



6(0^ + vf) + a(3w;2 - a^) 

3a + & ' 
(a + b)w + 2a(a + w) 



3a - 

(a + 6)3 



62(3a + 6) 

are obviously nonnegativc constants; moreover, f\{x) = (a;— w)^ for x G {—a, b}. 
This claim can be verified using the Mathematica command 

Reduce [b > && a > && w < -a && 

AO + Al X + A2 x~2 + A3 Max[0,x]'3 - Max[0,x - w] "3 <= 0, {w, x}] 

which produces the output 

b > && a > && w < -a && (x == -a II x == b) 

where AO, Al, A2, A3 represent the constants ^0,^1,^2,^3 as defined above; 
this verification takes about 1 second (this and other execution times given in 
this paper are in reference to an Intel Core 2 Duo PC with 4 GB of RAM). 

Therefore and because EX < = EX„,6, EX^ < o-^ = EX^^,, and EX]. < 
P = E(Xa,6)^, one has 

E{X - w)l ^ Ao + AiEX + A2EX^ + A3EXI 

< Ao + Al E Xa,b + A2E X„% + As E(X„,6)3 
= E{X^,t,-w)l. 

Case 2: -a < w < 0. Then 

f2{x) := X2{x + af + \sx\ ^ {x - w)l 
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for all a; e M, where 

-3w(6-w)2 _{b-wf{2{w + a)+a + b) 

'- {a + b){3a + b) ^"""^ 62(3^ + 5) 

are obviously nonnegative constants; moreover, f2{x) = {x—w)^ for x G {—a, b}. 
This claim can be verified using a similar Reduce command, which takes under 
1 second. Therefore, 

E{X - w)l ^ A2 E{X + af + A3 E 

^ A2 E{Xa,b + af + A3 E{Xa,b)l 

= E{X^^b-w)l. 

Case 3:w^0. Then 

for all X £ (— 00, y], since ^^~f ^ is nondccrcasing in a; G [w, 00) for each w 0; 
moreover, it is obvious that /^{x) = (x — w)'^ for x E (— oo,0] U {y}. 

Further, y^ ^ b^ > ^ = /3; hence, again by (4.8), a > 0. It follows that 

fsix) = {x- w)\ for X G {-a, b}. Moreover, E(X- ^)^ = /?. Thus, 

E(X - n.)l < EX3 < ^l-p± E(X,,,)3 = E(X,,, - ..)3^. 

Moreover, 

EX?. = ~ab= -P^ ^ =ab = a'; (4.48) 

the inequality here takes place because J^" ^ decreases in u > /3^/3^ while, as 

shown, y^ '^b^ > /3; the inequality in (4.48) is strict if /? ^ (because then 

= /3 < pq^' and hence 6 < y). 
Thus, in all the three cases, one has equality (4.7). Moreover, in the case w ^ 
the maximum in (4.5) is attained and equals E{Xafi — w)\, since Xafi ^ y a.s., 
EXafi = 0, EX^j, = fj^, and E{Xa,b)+ = The last sentence of Lemma 4.4 has 
also been proved. 

To complete the proof of the lemma, it remains to show that in the case 
w ^ the maxima in (4.5) and (4.6) are attained and equal E{X^ ^ — w)^; the 
same last sentence of Lemma 4.4 shows that in this case the max in (4.5) is not 
attained at X = X~ rif B ^ -\t-^ ~ because then E X? - < ct^ . 

a.b I I y^+a^ a.b 

Thus, it suSices to construct a r.v., say Xy, such that E{Xy — w)^ = E(X- ^ — 
w)3 , while Xy ^ y a.s., EXy = 0, EXy = , and E{Xy)\ — (3. One way to 

satisfy all these conditions is to let Xy ~ p5y + qiS-a^ + ri5y^ where v is close 
enough to — oc, ri := — Ag, qi := <? + Ag, oi := S, + Aa, p :— (3/y'^, q '■— 1 — p, 

^1 '■= - d^+|(t+a)^ ' •= 10^' ^ Va'^-ab = a"^ - ay, and a and 6 are 
given by (4.8). □ 
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Proof of Lemma 4-5. Take any / e ^F'l so that / is also in T\ y and T\\ then 
/' is convex or, equivalcntly, /" is nondecr easing; at that, /" is also convex. For 
any ^ e M, introduce the functions ■= Qzj and hz := hzj by the formulas 

9ziu) := (fiz) + {u- z)f'{z) + {u- zff"{z)/2) l{u < z} + f{u) l{u ^ z} 

(4.49) 

and 

hz{u) := gz{u) - f{z) - (n - z)f'{z) -{u- zff"{z)l2 (4.50) 
= (/(«) - f{z) -{u- z)f'{z) - (« - zff"{z)/2) l{u > z}, 

for all u e M. Then 

g',{v) = (/'(z) + {u - z)f"{z)) l{u <z} + f\u) \{u ^ z} 

for all u e IR. Since /' is convex, f\z) + (u — z)f"{z) ^ f'{u) and hence 
g',{u) < fiu) for all uGR. 

Moreover, g'y{u) is nonincrcasing in z for each m € R. Indeed, take any real 
Zi and Z2 such that zi < 2;2- If w ^ 2:2 then g'^^{u) = f'(u) = g'^^{u), so 
that g'z^iu) ^ .9x2 (u)- Next, if < w < 2:2 then, by the convexity of /', one has 

(«) = ./'N ^ f'{z2) + iu~Z2)f"iz2) = gl,{u), SO that again .g;^(u) ^ ^^^(m). 
Finally, if u < zi then, bounding the terms f'{z\) and {u — Z\)f" {z\) separately 
from below in view of the conditions that /' is convex and /" is nondecreasing, 
one has 

4W = [/'(^i)] + [(«-^i)/"(^i)] 

> [/'(^2) + {zi - Z2)f"{z2\ + [{U - Z,)f"{Z2)] 

= nz2) + {u-Z2)f"{z2)=g',,{u), 

so that in this case as well g'^-^iu) ^ g[^(u). 
Also, ^2 = / on [2,00). It follows that 

9z2 ^ 5zi ^ / on R for any real Zi and Z2 such that 2:1 < 02- (4-51) 

Next, hz{u) = h'^{u) = h'^{u) = for all u £ (-00, z) and h'i{u) = {f"{u) - 
f"{z)) I{u > z} = (f"{u) — f"{z))_^ for all u G M, since /" is nondecreasing. 
Moreover, /i" is convex, since /" is so. Therefore, by Proposition 1.1, hz G H^, 
which yields Ehz{^) < Ehz{ri). 

Assume now that z G (— 00, 0). Then, in view of (4.50), 

gz{u) = hz{u) + u{f'{z) + \z\f"{z)) + u^f"{z)/2 + c{z) (4.52) 

for all u gR, where c{z) := f{z) - zf'{z) + z'^f"{z)/2. So, if / e J^l, then 
the established earlier inequality E {£) ^ Eh^ (g) yields now E g^ (^) ^ E (77) , 
because E^ < Er;, E^^ ^ Er;^, and the coefficients /'(z) + |z|/"(z) and f"{z)/2 
of u and on the right-hand-side of (4.52) are nonnegative; if E^ = Eg, then 
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the sign of f'{z) + \z\f"{z) does not matter, so that the same inequahty E gz{^) < 
E gz {rf) will hold whenever / e J^\. . Similarly, if one has both equalities E ^ = 
Ery and E^^ = Etj^, then neither the sign of f'{z) + \z\f"{z) nor that of /"(z)/2 
matters, so that the inequality Egz{£,) ^ Egzirj) will hold whenever / e JF^ 

Now we are ready to complete the proof of parts (i), (ii) and (iii) of Lemma 4.5. 
Indeed, w.l.o.g. Ef(rj) < oo, for otherwise the inequality E /(^) ^ ^fiv) is triv- 
ial. Since /' is convex, there are some real a and b such that f'{x) ^ a+bx for all 
a; e M. Hence, f{x) ^ —c{l+x^) for some real c = c/ > and all a; e R. Now the 
condition E?7^ < oo implies that Ef{rf) > — oo, and so, E f(rj) G M. Therefore, 
in view of (4.49) and the condition Er/^ < oo, one has E 52(77) £ R. Now, letting 
z — > —00, observing that gz{u) — > f{u) for all u gR, and rccaUing (4.51), one 
concludes by dominated convergence that E (7^(77) — !■ E/(?7). Also, (4.51) implies 
that EgziO > ^fiO for aU z G R. RecaU now that EgzH) ^ ^gz{v) for ah 
2: e M. Thus, parts (i), (ii) and (iii) of Lemma 4.5 are proved. 

Let us prove part (iv) of Lemma 4.5. An idea here is to give the distribution 
of the r.v. 77 a heavy left tail, to which the cubic moment function x 1-^ x^ would 
be sensitive enough — in contrast with the moment functions a; 1— > {x — t)\ in 
Recall that 6x stands for the probability distribution concentrated at point 
X. Let 

(, ~ + |5i and ?7 ~ qS-^ + (| - q)S-i+e + 5^1, 

with w — > 00 and s := v~^/'^, and let q := 2{v'^'^{i-s)^) = be chosen 

so that 1 = E'ff = w^g + (—1 + e)^{^ — q) + ^. Then (eventually, as w — > 00) 
Et? = -q{v- 1 +e) + f > = E^, and E7;2 = 1 = E^^ 

It is not hard to see that E/(^) ^ E.f{rf) for all / G and hence for all 
/ G T-L\. Indeed, to see this it is enough to check that E(^ — t)\ ^ E(77 — t)\ 
for all t G M. If t ^ -1 + £, then trivially E(^ - t)\ = \{l - tf = E^r] - 1)%. If 
-l^t^-l + e, then E(^ - t)l = E{t] -t)l - - q){-l + e - tf ^ £(77 - t)l. 
li-v < t < -1, then E{S^-t)\ = E{T^-t)\ + {t+l)e-e'^ /2 + 0{e'^) < E{j^~t)\. 
Finally, \it<,-v, then E(^ - 1)\ = E{r] -t)\ + 2tEr] ^ E{r] - since E 77 > 0. 

Thus, all the conditions stated in the beginning of Lemma 4.5 are satisfied: 
E^<E77, E^2^E?72<oo, and E/(^) ^ £7(77) for all / G Til and hence for 
all / G H^, and one even has the equality E^^ = E?7^. Yet, E/*(^) > E/*(7y) 
for the cubic function /* defined by the formula f*{x) = x^ for all x G M, 
even though /, € J^l ^. Indeed, E/,(^) = E^^ = 0, while E/*(77) = Erj^ = 

-v^q + (-1 + e)^{^ - q) + ^ ^ -v^q ^ -11^/2 -00. The proof of part (iv) 
and hence that of the entire Lemma 4.5 is now complete. □ 

Proof of Lemma 4-6- This proof is almost identical to that of Lemma 4.5. Only 
two modifications are needed. First, in he beginning of the proof now we take 
any / in rather than in 

Second, note that the ".F^" condition that /"' is nondecreasing or, equiva- 
lently, that /" is convex was used in the proof of Lemma 4.5 only once — in the 

sentence "Moreover, /i" is convex, since /" is so." in the paragraph right after 
(4.51), to come, via Proposition 1.1, to the conclusion that hz G Hi^. Here, to 
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come to the conclusion that G we note instead that for all u € M 

h'Ju) = (/'(«) - /'(z) ~{u- z)r(z)) I{u > z}, 

which implies that h'^ is convex — since /' is convex and the right derivative 
{h'^)'{u) equals at u = 2;. □ 

Proof of Lemma ^.7. In view of Lemma 4.6(i), the relation C definition 
(1.10), and the Fubini theorem, it is enough to prove inequality (4.9) for all 
functions of the form u ^ {u — w)^ for w e M. By rescaling, w.l.o.g. y = 1. 
Further, r.v. T„2_jj^ + n^(, equals in distribution F + T^2_ij^ + E^^, where F is 
any r.v. such that F ~ N(0, cr^ — o-q) and F is independent of V^2_^^ and H^g. 
Now, conditioning on V^2_p^^ and 11^^ and using Jensen's inequality, one has 
E(F^2_^^ + - w)\ «C E(r^2_^„ + fl^„ - w)\ for aU w e R, so that w.l.o.g. 
(To = (7 and /3o < P ^ c^- Moreover, r.v.'s T^2_jj^ + 11^^ and T^2_jj + equal 
in distribution Td2 + W and 11^2 + W, respectively, where d := {(3 — Po)^^^ and 
W is any r.v. which is independent of r^2 and 11^2 and equals rg.2_^ + II^o in 
distribution. Thus, by conditioning on W, it suffices to prove that 

E(rd2 - w)l ^ E(nrf2 - w)l (4.53) 

for all d > and u> e M. Note that r£;2 and 11^2 are limits in distribution of 
Un := XliLi and y„ := Vi-n, respectively, as n ^ 00, where the Ui-n's 

are i.i.d. copies of ^ii^^ij^ and the Vi;„'s are i.i.d. copies of ^^2 1. By [2, 4], 
one has (4.53) with f/„ and Vn in place of V^2 and 0^2, respectively, provided 
that (P (so that ^di^^dj^ ^ 1 a.s.). 

Finally, it is clear that, for each w e M, (a; — w)\ = o(e^) as x — > c». Hence 
and in view of (1.4), for each w E R the sequences of r.v.'s [{Un — «') + ) and 
[{Vn — w)"^) are uniformly integrable. Now (4.53) follows by a limit transition; 
see e.g. [7, Theorem 5.4]. □ 

Proof of Lemma 4-8. In view of Lemma 4.5(i), definition (1.10), and the Fubini 
theorem, it is enough to prove inequality (4.9) for all functions of the form 
{u — w)^ for w G M. By Lemma 4.4, E{X — w)^ < ^{Xa,b — w)^ for some a 
and b such that a > 0, 6 > 0, Xa,b ^ y a.s., E ^ = ab ^ <t^, and E{Xa,b)+ = 
at that, one of course also has EXa.b = 0. So, if one could prove inequality (4.10) 
with Xa,b in place of X and ab in place of a^, then it would remain to refer to 
Lemma 4.7. Thus, w.l.o.g. one has X = Xag^bo for some positive oq and bo, and 
at that 

EXl,,^=aobo = a' and E{X,,,bo)l = = P- 

Oo + do 

By rescaling, w.l.o.g. 

y = 1, whence 60 ^ 1- 
The main idea of the proof is to introduce a family of r.v.'s of the form 

r]b := Xa(b),b + Crib) for b G [e, bo], 
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where 

bo + ao cr^' 
a{b) ■.= -^{b-e), 
T{b) := aobo - a{b)b, 
6 := W^(l-e)t + fiet, 

n, :=n, -En«, 

W. is a standard Wiener process, H. is a Poisson process with intensity 1, and 
^a{b),bi W., n. are independent for each 6 e [e, foo]- Note that e e (0, 69) ^ (0, 1); 
also, T is decreasing and hence nonnegative on the interval [e, b^], since a(6)6 is 
increasing in 6 € [e, 60] and a(6o) = ^^o- 
Let further 

E{b) := £:(6,«,) := E{r^,-^)l = ^ HU^) ' ajb) - + ajb) Ej^rib, + b - _ 

b + a(b) 

Since a{bo) = qq and a(£) = 0, one has Xq^j^^j = a.s. Thus, Lemma 4.8 is 
reduced to the inequality £{e) ^ S{bo)- Note that £{b) is continuous in 6 e [e, 60]; 
this follows because of the uniform intcgrability (cf. the last paragraph in the 
proof of Lemma 4.7). So, it is enough to show that the left derivative £'{b) of 
£{b) is no greater than for all b G {e, bo)- To compute this derivative, one can 
use the following 

Lemma 4.10. Consider any function f : (e, 60) x K 9 (6, x) 1-^ f{b, x) G R such 
that \fi;,{b,x)\ + \f{;,{b,x)\ + \fj,{b,x)\ + \f,{b,x)\ ^ Cfe\-\ and \fj,{b,x,) - 
f"^{b,X2)\ < C/|a;i — a;2|(el^^l + el^^l) /or 5ome constant Cf, allbG {£,bo), and 
all x, xi,X2 in M. Then for all b G (e, 60) 

E/(6-/t,Cr(6-fe))-E/(b,Cr(6)) _ p p ^. . ^ 

Ijm EF/(6,^.(,)), 

where 

Ff{b, x) := fiib, x) + (iillfl^M a; + 1) - f{b, x) - f^b, x))) T'{b). 

The proof of this lemma, which involves little more than routine Taylor ex- 
pansions, will be given later in this paper. 

By Lemma 4.10, for all b G (£,60) one has £'{b) = EG{b,£,T{b) — w), where 

r^/u \ ( b \' I o ^ J- ^1, ^\ , bFf^{b,x) + a{b)Ff^{b,x) 

f^(b,x) := (x-a{b))l, f2(b,x) {x + b)l. 

Thus, it suffices to show that G(6, u) ^ for all b G (£, 60) and m G R. 
Observe now that = a'{b) = 1 + M^, r'{b) = -(3a(6) + b), 
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and e — jq^^^jj- Substituting into (4.2) these expressions of (^+^(5) '2'(^)> T'{b), 

and e in terms of only b and a{b), one has (6 + a{b))^G{b, u) = —G{a{b), b, —u), 
where 

G{a, b, t):={a + b- 3ab^ - 6^) {-a - t)l 
- (a + 6 + 3aH'^ + ab^) {b - t)l 
+ 62(3a + b) (6(1 -a-t)l+a{l + b- t)l) 
+ 3 (2a2 + 3ab + b'^ - Sab^ - b^) (-a - t)l 
-3a{a + b + 3ab^ + b^) {b - t)l 
+ 3(3a + b){a + b- 6^) {b{-a - t)+ + a{b - t)+) . 

To complete the proof of the theorem, it is enough to show that G(a, b,t) ^ 
for all a > 0, 6 G (0, 1], and t G R. At that, w.l.o.g. t <l + b, since G{a, b,t) = 
for all a > 0, 6 G (0, 1], and t ^ 1 + b. Next, one has either a + 6 < 1 or 
a + 6 > 1. In the first case, —a ^6^1 — a^l + &, while in the second case 
—a ^ 1 — a ^ b ^ 1 + b. Therefore, it remains to verify that G(a, b,t) ^ in 
each of the following 8 (sub)cases: 

Case 10: a>0 kb>0 k a + b^l kt^-a; 
Case ll:a>Okb>Oka + bs^lk - a^i^fe; 
Case 12:a>0kb>0ka + b^lkb^t^l-a; 
Case 13:a>0kb>0ka + b^lkl-a^t^l + b; 
Case 20:a>0k0<b!^lka + b>lkt^-a; 
Case 21:a>OkO<b!^lka + b>lk - a^i^l-a; 
Case 22:a>0k0<b^lka + b>lkl-a^t^b; 
Case 23:a>0k0<b^lka + b>lkb^t^l + b. 

In Case 10, G{a, b, t) = a^(5a^ + 8a6 + 36^), which is obviously positive for 
all positive a and b. 
In Case 11, 

G{a,b,t) = Gii{a,b,t) := 

9a^b + Ua'^b'^ + 3ab^ - 9a%^ + 90^6^ - 3a%^ - 3ab'^ + 3a%'^ - a%'^ 
+ (-9a^ - QaH + Qab^ + 36^ - 9ab^ + 18a%^ - 9a%^ - 3b'^ + 6ab'^ - 3a%'^)t 
+ {-30^ - Gab - 36^ + 9ab^ - 9aH^ + 3b'^ - 3ab'^)t^ 

+ {a + b- 3ab^ - b'^)t^, 

which is positive in this case, Case 11; this can be verified using a Mathematica 
command of the form Reduce [Gil <= && casell. Reals], which outputs 
False (in about 13 seconds). 

The other 6 cases are verified similarly; the longest of them in terms of the 
time it takes Mathematica to check is Case 21 (in about 13 seconds). This 
concludes the proof of Lemma 4.8. □ 
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Proof of Lemma 4- 9- To begin, let indeed /3 = sa'^y and take any natural num- 
ber m and any positive real numbers a and b. Let then n := 2m, and let 
indeed Xi, . . . ,Xn be any independent r.v.'s such that Xi,. . . ,Xm are inde- 
pendent copies of Xjj^^ f,/^ and Xm+i, ■ ■ ■ ,X2m are independent copies of 
Xa/m.y Thcn, of course, EXi = for all i. Next, the system of equations 
Er=i = and X^-Li E(^j)+ = /? (cf. (2.3)) can be rewritten (in view 

of (2.2)) as b'^ + ay = and + a+my = ^^^2/' and then in turn as 

a = (ij2 — b'^)/y and 6^ = (1 — e)cr2 -|- rm(6), where rm(6) is a certain ex- 
pression in terms of m, b, a, y, and e (but not containing entries of a) such 
that rm(6) uniformly in 6 G [0, ct] as m ^ oo (recall that a, y, and e were 
fixed) and rm(6) is continuous in 6 e [0, tr]. It follows that for all large enough 
m € {1, 2, . . . } the equation 6^ = (1 — e)(T2 + rm(&) has a solution b = bm & [0, cr], 
and at that b ay/1 — e and hence a = (cP' — b'^)/y ^ £cr^/2/ as m — * oo. In 
particular, this implies the statement indented in the formulation of Lemma 4.9. 
The convergence of S in distribution to V(^i_^'^„2 +1/11^^2 as m ^ oo can now 
easily obtained, via either characteristic functions or well-known ready-to-use 
limit theorems. □ 

Proof of Lemma 4-10. First, take any 6 e (e, 6o) and h & (0, 6 — e), and write 

E f{h - h, ^r(b-h)) - E fib, ^rib)) =£i+ £2, (4.54) 

where 

5i := E fib - h, n-ft) - E fib, n-fc), 
£2 ■.= Efib,Yt,-h) - E/(6,y6) = EgiVb-h) - Eff(n), 
Yb ■■= ^T{b), 9ix) ■■= Qbix) ■= fib, x); 

note that 

E(n-/, - n)' = T{b - h) - T(b) = o{h) (4.55) 

and, using the Jensen inequality (as in the proof of Lemma 4.7), 

Ee^l'^"! < Ee^l^"-"! < Ee^l'^^l < 00 for all A > 0. (4.56) 

Next, for all real x and y, all b € is, bo), and all h G (0,6 — e) there exists 
some 6 G (0, 1) such that 

fib -h,y)- fib, y) = -flib, y)h + y /^',(& - Oh, y) 

= -fiib, x)h + Oiih\x -y\ + h^)ie\-\ + el^l)) , (4.57) 

since 1/6^(6, a;)| + \fl,'^ib,x)\ < C/el^l for all a; G M. Using (4.57) with Yb-h and 
yj, in place of y and x, respectively, and also the Cauchy-Schwarz inequality 
together with (4.55) and (4.56), one has 

£i = -hEfiib,Yb) 

+ 0(h^EiYb-h - Yb)^ E(e2|ni + e^l^--.!) + E(el^^l + el'^^-'^l)) 
= -/iE/^(6,n) + 0(/i=^/2). (4.58) 
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/oo 
P{Yb & dx){Eg{x + ri + p - Ely) - g{x)) , (4.59) 
-00 

where r] and u are independent r.v.'s, 77 ~ N(0, (1 — e) At), u Pois(e At), 
At := T{b - h) - T{b) = -hr'ib) + 0{h^). Note that 



P(z/ = 0) 


= e-"^^ = l-£AT + 0(AT^) 




= 1 + eT'{b)h + 0{h'^) = 1 + 0{h), 


P{u = 1) 


= e-^^^£ At = £ At + O(At^) 




= -eT'{b)h + 0{h'^) = 0{h), 


P{u > 2) 


= 0{h^), 


Eu 


= eAT = -eT'{b)h + 0{h^), 


Erj' 


= (1 - e) At = -(1 - e)T'(b)h + 0{h^), 


EH" 


= ©(At'"/^) = o(/iW2) for all m e (0,6].^ 



(4.60) 



Using some of these estimates together with the conditions \f^x{b, xi)~f^^{b, a;2)| ^ 
Cf\xi - a;2|(el^il + e'^^l) and \f'J^{b,x)\ + \ f!,{b,x)\ ^ Cfe^""^, as well as the 
Cauchy-Schwarz inequality (of. (4.58)), one has 

Egix + r,-Eu)- g{x) = -g'{x) Ev + ^ (E if + E^ v) 

+ o( E + E^ v){e\^+^-^''\ + e'^l))) 

= (g\x)e-^^{l-e))T\b)h^O{h'l\\-\), 
Eg{x^-^^-\-Ev)- g{x) =g{x^\)-g(x) 

+ 0( E ((|r/| + Ej.)(el^+''+i-^"l + el^+^l))) 
= g(a; + l)-g(x)+0(/ii/2el-l); 
E {g{x ^^^v-Ev)- g{x)f = o( E(|r/| +iy+Euf E (el^+'?+''-EH + e'^l)^) 

= 0(e2|-l). 

Hence and by (4.60), 

Eg{x + r] + u-Eu)- g{x) = P{v = Q){Eg{x + r] - Ev) - g{x)) 

+ P{u = l){Eg{x + v + l-Eiy)- g{x)) 
+ 0{P{u > 2)el^l) 

= [g'{x)e - ^(1 - £) - £ (5(0; + 1) - g{x)))T'{b)h 
+ 0(/i=^/2el^l). 
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Now, in view of (4.59) and (4.56), 

£2 = E (^g'{Y,)e - ^^(1 - e) - e {g{Y, + 1) - 5(n)))r'(6)/i + 0{h^/'). 
This, together with (4.58) and (4.54), completes the proof of Lemma 4.10. □ 
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